# Units of Rotation and translation from Essential Matrix

I obtained my Rotation and translation matrices using the SVD of E. I used the following code to do so :

    SVD svd(E);
Matx33d W(0, -1, 0, 1, 0, 0, 0, 0, 1);
Mat_<double> R = svd.u * Mat(W) * svd.vt;
Mat_<double> t = svd.u.col(2);
Matx34d P1(R(0, 0), R(0, 1), R(0, 2), t(0), R(1, 0), R(1, 1), R(1, 2), t(1), R(2, 0), R(2, 1), R(2, 2), t(2));


I want to know what are the units of R and t ? When I calibrated the cameras I got the baseline distance in cm , So is the translation vector units also in cm ? Also what are the units of R?

P.S : I have read the following question on Stackoverflow , but it had actually confused me as no answer was accepted. Also I wanted to know if the translation vector I got is it in the world frame?(Is it the real world distance moved from the initial position)

EDIT 1: I have also observed that the values are normalized for the translation vector i.e x^2+y^2+z^2 is nearly 1. So how do I get the actual vector?

EDIT 2: I have read the following question in Stackoverflow and I think I will be implementing it. Don't know if there is any better way.

edit retag close merge delete

Sort by » oldest newest most voted

It depends whether you have knowledge about an objects absolute metric size (e.g., the basewidth of a stereo camera setup). The Essential Matrix E decomposes into rotation R and translation t, but notice that there are four possible solutions, but only one geometrically feasible one (see here, slide 8). After doing the cheirality check, i.e., all 3D points must be in front of the two cameras, you get the feasible solution.

The translation vector t (and rotation matrix R) yield your camera matrix P = [R|t], where R corresponds to the camera orientation in world coordinates and t is the position of the world origin in camera coordinates. Therefore, to obtain the camera origin c in world coordinates, you calculate:

c = -inverse(R)*t


As you noticed in your EDIT 1: The translation vector is normalized to unit. This is due to the SVD, which always returns a solution normalized to unit. Therefore, it is impossible to retrieve an absolute metric translation (in m, mm or whatever) without any additional knowledge of the absolute dimensions of an observed object, but you only obtain a correct solution up-to-scale.

Edit: you might also take a look at this post for the calculation of the four possible solutions.

more

Thank you very much. I need little more clarity on convention of the coordinates. So camera origin c is position of camera in world coordinates, which keep changing if the scene changes? My understanding is that after finding Fundamental matrix(findFundamentalMat) from from the common keypoints of Frame1 and Frame2, from which R,t are extracted from Essential matrix. The camera position at Frame1 is subjected to R rotation and t translation will give camera position at Frame2. Is that right or wrong?

( 2015-07-24 02:32:13 -0500 )edit
1

Yes, the camera origin c is the position in world coordinates. When extracting the camera matrix P2 = [R|t] of your second view, you always assume, that the world coordinates coincide with the first camera, such that the first camera actually has a camera matrix P1 = [identity|0]. Therefore, R and t represent the motion from the second camera to the first camera (or world coordinate origin) in the second camera's coordinate frame. Finally, you need to invert this motion, to obtain the pose of your second camera w.r.t to the world coordinate system. This blog post is very helpful to understand these geometric relations. So regarding your last question in above comment: It's exactly the other way around.

( 2015-07-24 03:19:47 -0500 )edit

Thanks. For my Edit 2 : When finding the scaling factor lambda, (sx2, sy2, 1) = K(R|lambda*T)(X,Y,Z) The (X,Y,Z) are coordinates in Frame2 and sx2,sy2 are the pixel coordinates in Frame1 ? Assuming I have (X,Y,Z) for different keypoints , I will get a set of K values, for which I will take a median to estimate it. So Is this scaling factor changing or it is constant throughout motion? When I was doing the above method for a static scene, I was getting the some [R|t] instead of [identity|0] (The values of [R|t] was same for all static scenes incidentally ) , is that due to calculation errors or will the scaling factor make it nearly [identity|0]?

( 2015-07-24 03:59:07 -0500 )edit
1

The camera matrix from an essential matrix obtained with two views, is always the camera matrix of the second camera, P2 = K*[R|t] (considering calibration matrix K), based on the assumption that the world coordinate frame has its origin at the first camera center. A 2D point x seen in the second image is given by x = P2*X, where X is a 3D point in world coordinates. So whenever you calculate the camera pose from a new pair of views, the camera pose is always relative to the first view of this pair. So let's say you have ndifferent views. You could e.g. always calculate the essential matrix w.r.t the first view and then, from the first set of triangulated keypoints, you should ideally be able to estimate the correct scale of all other views w.r.t the first (unit) translation.

( 2015-07-24 04:17:32 -0500 )edit

Ok, So essentially the scale factor is constant. But then getting some [R|t] instead of [identity|0] for static scene , should either be calculation mistake or may be due to SVD . Is there any reason why this is happening? Programmatically I am solving this by ignoring the static frame as the [R|t] had certain values when Frame1= Frame2 . Thank you very much, you have pretty much cleared all my doubts and also added finer details for implementation like checking the feasible rotations and translations.

( 2015-07-24 04:43:34 -0500 )edit
1

If the scene is static, then the scale factor of all 3D points should be constant w.r.t. to one specific view. You never get [identitiy|0] from the SVD decomposition of the Essential matrix, because this is the underlying assumption of the first camera that allows you to calculate [R|t] of the second camera. Or am I maybe not understanding your question correctly? So by static scene, do you mean a rigid scene with two images taken from different poses?

( 2015-07-24 06:37:21 -0500 )edit

By static scene I meant that the Frame1 and Frame2 are the same image i.e in the findFundamentalMat is inputed the exact same points. Which effectively means there is no rotation nor translation from the previous position.(Effectively the first camera and second camera positions are same, as I am assuming the first camera position as [identitiy|0], then the second camera position should also be [identitiy|0], atleast the rotation part may not be identity matrix, but the translation has to be zero )

( 2015-07-24 11:07:52 -0500 )edit
1

I think the problem is that the Fundamental matrix is defined by the epipolar geometry, but I can't recall the details right now and what could probably be the cause of your problem.

( 2015-07-24 11:52:20 -0500 )edit

Official site

GitHub

Wiki

Documentation