# Pose estimation in unknown scene

Hello!

I'm trying to retrieve the pose between two internally calibrated cameras in an unknown scene and all methods that I have found so far (SVD of the essential matrix, vanishing points and others) return the translation up to a scale. However, I need the (exact) estimation of such translation, in order to apply some reconstruction methods (plane sweep, voxel coloring, etc.).

How can I do this? Can anyone help me?

edit retag close merge delete

Sort by ยป oldest newest most voted

Do you have any other external information aside from just the cameras? Otherwise, you will only ever be able to determine T up to scale. For example, stereo camera setups use the distance between cameras as their baseline, and comparing T to that distance is trivial and an easy way translate that to the rest of the scene. In your case, you'll need a common object of known size/distance from the cameras, and then do the math to calculate the scale of T.

more

All I have is the internal parameters. Is the scale of T related in any way with focal length? I mean, the essential matrix is found with the help of the internal camera matrix. I was wondering if T and the focal length could be related. Another question: If I have multiple images taken with the same camera, and find the pose between pairs, would the translation between such pairs be proportional? i.e.: would all the pairwise translations be in the same "unit"?

( 2013-04-25 05:18:53 -0500 )edit

For your first question, no. The essential matrix itself is up to scale. What you're getting is the epipolar geometry of the scene, which basically describes the relationship of the two cameras to points in the scene. It's all projective though, so all you'll get from the scene is the geometry, but no scale. You can read more on its properties here, but a quick summary is that the essential matrix multiplied by any scale will give you the same generalized scene geometry.

( 2013-04-25 23:54:06 -0500 )edit

For your second question, when you start introducing more cameras, rather than calculating a matrix for each pair, you'll calculate a single matrix that describes the scene geometry between all of them them. For example, with 3 separate poses/cameras, you can calculate the "Trifocal tensor" (The fundamental matrix is a bifocal tensor). Having N-cameras, and you can calculate the "N-Focal tensor." It will describe the geometry between the cameras, but again, only up to a scale. Without external information, you won't get scale. For a whole book on these matrices, read Multiple View Geometry by Hartley and Zisserman. It's practically the bible on this subject.

( 2013-04-26 00:00:11 -0500 )edit

Official site

GitHub

Wiki

Documentation

## Stats

Asked: 2013-04-23 11:32:30 -0500

Seen: 847 times

Last updated: Apr 23 '13