I am currently using epipolar geometry based pose estimation for estimating pose of one camera w.r.t another, with non-zero baseline between the cameras. I am using the five-point algorithm (implemented as findEssentialMat in opencv) to determine the up-to-scale translation and rotation matrix between the two cameras.
I have found two interesting problems when working with this, it would be great if someone can share their views: I don't have a great theoretical background in computer vision:
- If the rotation of the camera is along the Z axis i.e., parallel to the scene and the translation is non-zero, the translation between camera1 and camera2 (which is along X in the real world) is wrongly estimated to be along Z. Example case: cam1 and cam2 spaced by approx 0.5 m on the X axis, cam2 rotated clockwise by 45 deg.
Image pair
Output:
Translation vector is [-0.02513, 0.0686, 0.9973] (wrong, should be along X)
Rotation Euler angles: [-7.71364, 6.0731, -43.7583] (correct)
- The geometry between image1 and image2 is not the exact inverse of the geometry between image2 and image1. Again while an image1<=>image2 correspondence produces the correct translation, image2<=>image1 is way off. Example below, where camera2 was displaced along X and rotated for ~30 degrees along Y
Image 1 to image 2
Output: Rotation [-1.578, 24.94, -0.1631] (Close) Translation [-0.0404, 0.035, 0.998] (Wrong)
Image 2 to image 1
Output: Rotation [2.82943, -30.3206, -3.32636] Translation [0.99366, -0.0513, -0.0999] (Correct)
Looks like it has no issues figuring the rotations out but the translations are a hit or miss.
As to question 1, I was initially concerned because because the rotation is along the Z axis, the points might appear to be all coplanar. But the five point algorithm paper particularly states: "The 5-point method is essentially unaffected by the planar degeneracy and still works".
Thank you for your time!