Epipolar geometry: pose estimate detects translation in wrong axis
I am trying to use epipolar geometry concepts through the findEssentialMat, recoverPose, solvePnP functions in OpenCV to estimate the relative motion between two images. I am currently using a simulator to get my camera images, so the distortion is zero and I know the camera parameters beforehand. I am using the OpenCV coordinate convention in this question.
When two images are taken from positions displaced only along the X axis, essential matrix based estimation works perfectly, I get a translation matrix like [-0.9999, 0.0001, 0.0001]. But when the camera is moved along the forward/backward Z axis, I see the t matrix contains a higher than usual value on the Y axis, whereas X and Z are correct. Example results from solvePnP for these two cases are:
- Image 1 taken from (0, 0, 0) , image 2 from (-2, 0, 0):
t = [-1.988, 0.023, 0.046] (Y and Z are acceptably close to 0) - Image 1 taken from (0, 0, 0) , image 2 from (-2, 0, 4):
t = [-2.028, -0.441, 4.0983] (Y is totally off)
I don't understand where the 0.4m on the Y axis is coming from. The simulator is very accurate in terms of physics and there's absolutely no movement of that magnitude on the Y axis. The same behavior is reflected in the relative R/t output from the essential matrix decomposition. Any tips on solving this/suggestions for further debugging this issue would be helpful.
EDIT:
I was reading through David Nister's paper about the 5 point algorithm which is the core of findEssentialMat, and it says this:
"The 5-point method significantly outperforms the other non-iterative methods for sideways motion. The 5-point results are quite good, while the results of the other non-iterative methods are virtually useless. As the noise grows large, the other methods are swamped and begin placing the epipole somewhere inside the image regardless of its true position. This phenomenon has been observed previously by e.g. [15]. It is particularly marked for the 6 and 8 point methods. For the forward motion cases, the results are quite different however. This is partly due to a slight deterioration of the results for the 5-point method, but mainly due to a vast improvement of the results for the other methods. In particular the 8-point method gives excellent results on forward motion".
I wonder if the optimal solution is some kind of a 'best of both worlds' scenario? Has anyone worked on similar problems before, who can share some wisdom? Thanks.