Hello, I am trying to use the PNP algorithm implementations in Open CV (EPNP, Iterative etc.) to get the metric pose estimates of cameras in a two camera pair (not a conventional stereo rig, the cameras are free to move independent of each other). My source of images currently is a robot simulator (Gazebo), where two cameras are simulated in a scene of objects. The images are almost ideal: i.e., zero distortion, no artifacts.
So to start off, this is my first pair of images.
I assume the right camera as "origin". In metric world coordinates, left camera is at (1,1,1) and right is at (-1,1,1) (2m baseline along X). Using feature matching, I construct the essential matrix and thereby the R and t of the left camera w.r.t. right. This is what I get.
R in euler angles: [-0.00462468, -0.0277675, 0.0017928]
t matrix: [-0.999999598978524; -0.0002907901840156801; -0.0008470441900959029]
Which is right, because the displacement is only along the X axis in the camera frame. For the second pair, the left camera is now at (1,1,2) (moved upwards by 1m).
Now the R and t of left w.r.t. right become:
R in euler angles: [0.0311084, -0.00627169, 0.00125991]
t matrix: [-0.894611301085138; -0.4468450866008623; -0.0002975759140359637]
Which again makes sense: there is no rotation; the displacement along Y axis is half of what the baseline (along X) is, so on, although this t doesn't give me the real metric estimates.
So in order to get metric estimates of pose in case 2, I constructed the 3D points using points from camera 1 and camera 2 in case 1 (taking the known baseline into account: which is 2m), and then ran the PNP algorithm with those 3D points and the image points from case 2. Strangely, both ITERATIVE and EPNP algorithms give me a result that looks like this:
Pose according to final PNP calculation is:
Rotation euler angles: [-9.68578, 15.922, -2.9001]
Metric translation in m: [-1.944911461358863; 0.11026997013253; 0.6083336931263812]
Am I missing something basic here? I thought this should be a relatively straightforward calculation for PNP given that there's no distortion etc. ANy comments or suggestions would be very helpful, thanks!