I am trying to do relative pose estimation in a stereo setup using openCV, where I compute pose using the essential matrix and then try to reconstruct the scene using cv::triangulatePoints. As this gives me reconstruction up to an arbitrary scale, I thought I could calculate the subsequent scale factors if the initial baseline (~ scale) is known (and the cameras are purely translated in X). But even if the cameras do not move, and the scene does not change, if I take multiple pairs of images, each of the reconstruction is coming up with a different scale: this I know by taking two points and comparing the distances between them. The whole scene is either magnified or shrunk, the reconstruction itself is not wrong, per se.
To get around this, I tried something naive: I got the ratio of scale change between pair1 and pair2, and tried this:
1. Say distance between 3d points A and B in pair 2 = s*distance between same points in pair 1,
2. Rescale whole scene obtained from pair2 by s.
3. Use perspective n point algorithm (solvePnP in opencv) with these rescaled 3d points and 2d correspondences from pair 2.
But the answer I am getting is s*t1, where t1 is the translation in the first case. This is confusing me greatly. Because the scene has not changed by much at all, the 2d point correspondences between 1 and 2 have not changed by much at all. So is it still impossible to determine the scale of a reconstruction even though I know the initial "scale" for sure (i thought this would be straightforward)? Why is the reconstruction giving me arbitrary results at every step, and how does PnP magically know that I have done this whole rescaling?