Ask Your Question
0

Pose estimation in unknown scene

asked 2013-04-23 11:32:30 -0600

diego gravatar image

Hello!

I'm trying to retrieve the pose between two internally calibrated cameras in an unknown scene and all methods that I have found so far (SVD of the essential matrix, vanishing points and others) return the translation up to a scale. However, I need the (exact) estimation of such translation, in order to apply some reconstruction methods (plane sweep, voxel coloring, etc.).

How can I do this? Can anyone help me?

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
3

answered 2013-04-23 17:59:35 -0600

HD_Mouse gravatar image

Do you have any other external information aside from just the cameras? Otherwise, you will only ever be able to determine T up to scale. For example, stereo camera setups use the distance between cameras as their baseline, and comparing T to that distance is trivial and an easy way translate that to the rest of the scene. In your case, you'll need a common object of known size/distance from the cameras, and then do the math to calculate the scale of T.

edit flag offensive delete link more

Comments

All I have is the internal parameters. Is the scale of T related in any way with focal length? I mean, the essential matrix is found with the help of the internal camera matrix. I was wondering if T and the focal length could be related. Another question: If I have multiple images taken with the same camera, and find the pose between pairs, would the translation between such pairs be proportional? i.e.: would all the pairwise translations be in the same "unit"?

diego gravatar imagediego ( 2013-04-25 05:18:53 -0600 )edit

For your first question, no. The essential matrix itself is up to scale. What you're getting is the epipolar geometry of the scene, which basically describes the relationship of the two cameras to points in the scene. It's all projective though, so all you'll get from the scene is the geometry, but no scale. You can read more on its properties here, but a quick summary is that the essential matrix multiplied by any scale will give you the same generalized scene geometry.

HD_Mouse gravatar imageHD_Mouse ( 2013-04-25 23:54:06 -0600 )edit

For your second question, when you start introducing more cameras, rather than calculating a matrix for each pair, you'll calculate a single matrix that describes the scene geometry between all of them them. For example, with 3 separate poses/cameras, you can calculate the "Trifocal tensor" (The fundamental matrix is a bifocal tensor). Having N-cameras, and you can calculate the "N-Focal tensor." It will describe the geometry between the cameras, but again, only up to a scale. Without external information, you won't get scale. For a whole book on these matrices, read Multiple View Geometry by Hartley and Zisserman. It's practically the bible on this subject.

HD_Mouse gravatar imageHD_Mouse ( 2013-04-26 00:00:11 -0600 )edit

Question Tools

Stats

Asked: 2013-04-23 11:32:30 -0600

Seen: 951 times

Last updated: Apr 23 '13