OpenCV Q&A Forum - RSS feedhttp://answers.opencv.org/questions/OpenCV answersenCopyright <a href="http://www.opencv.org">OpenCV foundation</a>, 2012-2018.Fri, 24 Jul 2015 11:52:20 -0500Units of Rotation and translation from Essential Matrixhttp://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/ I obtained my Rotation and translation matrices using the SVD of E. I used the following code to do so :
SVD svd(E);
Matx33d W(0, -1, 0, 1, 0, 0, 0, 0, 1);
Mat_<double> R = svd.u * Mat(W) * svd.vt;
Mat_<double> t = svd.u.col(2);
Matx34d P1(R(0, 0), R(0, 1), R(0, 2), t(0), R(1, 0), R(1, 1), R(1, 2), t(1), R(2, 0), R(2, 1), R(2, 2), t(2));
I want to know what are the units of R and t ? When I calibrated the cameras I got the baseline distance in **cm** , So is the translation vector units also in **cm** ? Also what are the units of R?
P.S : I have read the following question on [Stackoverflow](https://stackoverflow.com/questions/16856177/what-will-be-the-translation-vector-unit-from-essential-matrix/16870447#16870447) , but it had actually confused me as no answer was accepted. Also I wanted to know if the translation vector I got is it in the world frame?(Is it the real world distance moved from the initial position)
EDIT 1:
I have also observed that the values are normalized for the translation vector i.e x^2+y^2+z^2 is nearly 1. So how do I get the actual vector?
EDIT 2: I have read the following question in [Stackoverflow](https://stackoverflow.com/questions/3678317/t-and-r-estimation-from-essential-matrix?rq=1) and I think I will be implementing it. Don't know if there is any better way.Wed, 22 Jul 2015 22:11:34 -0500http://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/Answer by takahashi for <p>I obtained my Rotation and translation matrices using the SVD of E. I used the following code to do so :</p>
<pre><code> SVD svd(E);
Matx33d W(0, -1, 0, 1, 0, 0, 0, 0, 1);
Mat_<double> R = svd.u * Mat(W) * svd.vt;
Mat_<double> t = svd.u.col(2);
Matx34d P1(R(0, 0), R(0, 1), R(0, 2), t(0), R(1, 0), R(1, 1), R(1, 2), t(1), R(2, 0), R(2, 1), R(2, 2), t(2));
</code></pre>
<p>I want to know what are the units of R and t ? When I calibrated the cameras I got the baseline distance in <strong>cm</strong> , So is the translation vector units also in <strong>cm</strong> ? Also what are the units of R?</p>
<p>P.S : I have read the following question on <a href="https://stackoverflow.com/questions/16856177/what-will-be-the-translation-vector-unit-from-essential-matrix/16870447#16870447">Stackoverflow</a> , but it had actually confused me as no answer was accepted. Also I wanted to know if the translation vector I got is it in the world frame?(Is it the real world distance moved from the initial position)</p>
<p>EDIT 1:
I have also observed that the values are normalized for the translation vector i.e x^2+y^2+z^2 is nearly 1. So how do I get the actual vector?</p>
<p>EDIT 2: I have read the following question in <a href="https://stackoverflow.com/questions/3678317/t-and-r-estimation-from-essential-matrix?rq=1">Stackoverflow</a> and I think I will be implementing it. Don't know if there is any better way.</p>
http://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/?answer=66941#post-id-66941It depends whether you have knowledge about an objects absolute metric size (e.g., the basewidth of a stereo camera setup). The Essential Matrix `E` decomposes into rotation `R` and translation `t`, but notice that there are four possible solutions, but only one geometrically feasible one (see [here](http://isit.u-clermont1.fr/~ab/Classes/DIKU-3DCV2/Handouts/Lecture16.pdf), slide 8). After doing the cheirality check, i.e., all 3D points must be in front of the two cameras, you get the feasible solution.
The translation vector `t` (and rotation matrix `R`) yield your camera matrix `P = [R|t]`, where `R` corresponds to the camera orientation in world coordinates and `t` is the position of the world origin in camera coordinates. Therefore, to obtain the camera origin `c` in world coordinates, you calculate:
c = -inverse(R)*t
As you noticed in your EDIT 1: The translation vector is normalized to unit. This is due to the SVD, which always returns a solution normalized to unit. Therefore, it is impossible to retrieve an absolute metric translation (in m, mm or whatever) without any additional knowledge of the absolute dimensions of an observed object, but you only obtain a correct solution *up-to-scale*.
Edit: you might also take a look at this [post](http://answers.opencv.org/question/18565/pose-estimation-produces-wrong-translation-vector/) for the calculation of the four possible solutions.Fri, 24 Jul 2015 01:49:57 -0500http://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/?answer=66941#post-id-66941Comment by takahashi for <p>It depends whether you have knowledge about an objects absolute metric size (e.g., the basewidth of a stereo camera setup). The Essential Matrix <code>E</code> decomposes into rotation <code>R</code> and translation <code>t</code>, but notice that there are four possible solutions, but only one geometrically feasible one (see <a href="http://isit.u-clermont1.fr/~ab/Classes/DIKU-3DCV2/Handouts/Lecture16.pdf">here</a>, slide 8). After doing the cheirality check, i.e., all 3D points must be in front of the two cameras, you get the feasible solution. </p>
<p>The translation vector <code>t</code> (and rotation matrix <code>R</code>) yield your camera matrix <code>P = [R|t]</code>, where <code>R</code> corresponds to the camera orientation in world coordinates and <code>t</code> is the position of the world origin in camera coordinates. Therefore, to obtain the camera origin <code>c</code> in world coordinates, you calculate: </p>
<pre><code>c = -inverse(R)*t
</code></pre>
<p>As you noticed in your EDIT 1: The translation vector is normalized to unit. This is due to the SVD, which always returns a solution normalized to unit. Therefore, it is impossible to retrieve an absolute metric translation (in m, mm or whatever) without any additional knowledge of the absolute dimensions of an observed object, but you only obtain a correct solution <em>up-to-scale</em>. </p>
<p>Edit: you might also take a look at this <a href="http://answers.opencv.org/question/18565/pose-estimation-produces-wrong-translation-vector/">post</a> for the calculation of the four possible solutions.</p>
http://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/?comment=66983#post-id-66983If the scene is static, then the scale factor of all 3D points should be constant w.r.t. to one specific view. You never get `[identitiy|0]` from the SVD decomposition of the Essential matrix, because this is the underlying assumption of the first camera that allows you to calculate [R|t] of the second camera. Or am I maybe not understanding your question correctly? So by static scene, do you mean a rigid scene with two images taken from different poses?Fri, 24 Jul 2015 06:37:21 -0500http://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/?comment=66983#post-id-66983Comment by takahashi for <p>It depends whether you have knowledge about an objects absolute metric size (e.g., the basewidth of a stereo camera setup). The Essential Matrix <code>E</code> decomposes into rotation <code>R</code> and translation <code>t</code>, but notice that there are four possible solutions, but only one geometrically feasible one (see <a href="http://isit.u-clermont1.fr/~ab/Classes/DIKU-3DCV2/Handouts/Lecture16.pdf">here</a>, slide 8). After doing the cheirality check, i.e., all 3D points must be in front of the two cameras, you get the feasible solution. </p>
<p>The translation vector <code>t</code> (and rotation matrix <code>R</code>) yield your camera matrix <code>P = [R|t]</code>, where <code>R</code> corresponds to the camera orientation in world coordinates and <code>t</code> is the position of the world origin in camera coordinates. Therefore, to obtain the camera origin <code>c</code> in world coordinates, you calculate: </p>
<pre><code>c = -inverse(R)*t
</code></pre>
<p>As you noticed in your EDIT 1: The translation vector is normalized to unit. This is due to the SVD, which always returns a solution normalized to unit. Therefore, it is impossible to retrieve an absolute metric translation (in m, mm or whatever) without any additional knowledge of the absolute dimensions of an observed object, but you only obtain a correct solution <em>up-to-scale</em>. </p>
<p>Edit: you might also take a look at this <a href="http://answers.opencv.org/question/18565/pose-estimation-produces-wrong-translation-vector/">post</a> for the calculation of the four possible solutions.</p>
http://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/?comment=66952#post-id-66952Yes, the camera origin `c` is the position in world coordinates. When extracting the camera matrix `P2 = [R|t]` of your second view, you always assume, that the world coordinates coincide with the first camera, such that the first camera actually has a camera matrix `P1 = [identity|0]`.
Therefore, R and t represent the motion from the second camera to the first camera (or world coordinate origin) in the second camera's coordinate frame. Finally, you need to invert this motion, to obtain the pose of your second camera w.r.t to the world coordinate system. This [blog post](http://ksimek.github.io/2012/08/22/extrinsic/) is very helpful to understand these geometric relations. So regarding your last question in above comment: It's exactly the other way around.Fri, 24 Jul 2015 03:19:47 -0500http://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/?comment=66952#post-id-66952Comment by Sai Kondaveeti for <p>It depends whether you have knowledge about an objects absolute metric size (e.g., the basewidth of a stereo camera setup). The Essential Matrix <code>E</code> decomposes into rotation <code>R</code> and translation <code>t</code>, but notice that there are four possible solutions, but only one geometrically feasible one (see <a href="http://isit.u-clermont1.fr/~ab/Classes/DIKU-3DCV2/Handouts/Lecture16.pdf">here</a>, slide 8). After doing the cheirality check, i.e., all 3D points must be in front of the two cameras, you get the feasible solution. </p>
<p>The translation vector <code>t</code> (and rotation matrix <code>R</code>) yield your camera matrix <code>P = [R|t]</code>, where <code>R</code> corresponds to the camera orientation in world coordinates and <code>t</code> is the position of the world origin in camera coordinates. Therefore, to obtain the camera origin <code>c</code> in world coordinates, you calculate: </p>
<pre><code>c = -inverse(R)*t
</code></pre>
<p>As you noticed in your EDIT 1: The translation vector is normalized to unit. This is due to the SVD, which always returns a solution normalized to unit. Therefore, it is impossible to retrieve an absolute metric translation (in m, mm or whatever) without any additional knowledge of the absolute dimensions of an observed object, but you only obtain a correct solution <em>up-to-scale</em>. </p>
<p>Edit: you might also take a look at this <a href="http://answers.opencv.org/question/18565/pose-estimation-produces-wrong-translation-vector/">post</a> for the calculation of the four possible solutions.</p>
http://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/?comment=66949#post-id-66949Thank you very much. I need little more clarity on convention of the coordinates. So camera origin `c` is position of camera in world coordinates, which keep changing if the scene changes? My understanding is that after finding Fundamental matrix(**findFundamentalMat**) from from the common keypoints of Frame1 and Frame2, from which R,t are extracted from Essential matrix. The camera position at Frame1 is subjected to R rotation and t translation will give camera position at Frame2. Is that right or wrong?Fri, 24 Jul 2015 02:32:13 -0500http://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/?comment=66949#post-id-66949Comment by Sai Kondaveeti for <p>It depends whether you have knowledge about an objects absolute metric size (e.g., the basewidth of a stereo camera setup). The Essential Matrix <code>E</code> decomposes into rotation <code>R</code> and translation <code>t</code>, but notice that there are four possible solutions, but only one geometrically feasible one (see <a href="http://isit.u-clermont1.fr/~ab/Classes/DIKU-3DCV2/Handouts/Lecture16.pdf">here</a>, slide 8). After doing the cheirality check, i.e., all 3D points must be in front of the two cameras, you get the feasible solution. </p>
<p>The translation vector <code>t</code> (and rotation matrix <code>R</code>) yield your camera matrix <code>P = [R|t]</code>, where <code>R</code> corresponds to the camera orientation in world coordinates and <code>t</code> is the position of the world origin in camera coordinates. Therefore, to obtain the camera origin <code>c</code> in world coordinates, you calculate: </p>
<pre><code>c = -inverse(R)*t
</code></pre>
<p>As you noticed in your EDIT 1: The translation vector is normalized to unit. This is due to the SVD, which always returns a solution normalized to unit. Therefore, it is impossible to retrieve an absolute metric translation (in m, mm or whatever) without any additional knowledge of the absolute dimensions of an observed object, but you only obtain a correct solution <em>up-to-scale</em>. </p>
<p>Edit: you might also take a look at this <a href="http://answers.opencv.org/question/18565/pose-estimation-produces-wrong-translation-vector/">post</a> for the calculation of the four possible solutions.</p>
http://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/?comment=67004#post-id-67004By static scene I meant that the Frame1 and Frame2 are the same image i.e in the **findFundamentalMat** is inputed the exact same points. Which effectively means there is no rotation nor translation from the previous position.(Effectively the first camera and second camera positions are same, as I am assuming the first camera position as `[identitiy|0]`, then the second camera position should also be `[identitiy|0]`, atleast the rotation part may not be identity matrix, but the translation has to be zero )Fri, 24 Jul 2015 11:07:52 -0500http://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/?comment=67004#post-id-67004Comment by takahashi for <p>It depends whether you have knowledge about an objects absolute metric size (e.g., the basewidth of a stereo camera setup). The Essential Matrix <code>E</code> decomposes into rotation <code>R</code> and translation <code>t</code>, but notice that there are four possible solutions, but only one geometrically feasible one (see <a href="http://isit.u-clermont1.fr/~ab/Classes/DIKU-3DCV2/Handouts/Lecture16.pdf">here</a>, slide 8). After doing the cheirality check, i.e., all 3D points must be in front of the two cameras, you get the feasible solution. </p>
<p>The translation vector <code>t</code> (and rotation matrix <code>R</code>) yield your camera matrix <code>P = [R|t]</code>, where <code>R</code> corresponds to the camera orientation in world coordinates and <code>t</code> is the position of the world origin in camera coordinates. Therefore, to obtain the camera origin <code>c</code> in world coordinates, you calculate: </p>
<pre><code>c = -inverse(R)*t
</code></pre>
<p>As you noticed in your EDIT 1: The translation vector is normalized to unit. This is due to the SVD, which always returns a solution normalized to unit. Therefore, it is impossible to retrieve an absolute metric translation (in m, mm or whatever) without any additional knowledge of the absolute dimensions of an observed object, but you only obtain a correct solution <em>up-to-scale</em>. </p>
<p>Edit: you might also take a look at this <a href="http://answers.opencv.org/question/18565/pose-estimation-produces-wrong-translation-vector/">post</a> for the calculation of the four possible solutions.</p>
http://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/?comment=66961#post-id-66961The camera matrix from an essential matrix obtained with two views, is always the camera matrix of the second camera, `P2 = K*[R|t]` (considering calibration matrix `K`), based on the assumption that the world coordinate frame has its origin at the first camera center. A 2D point `x` seen in the second image is given by `x = P2*X`, where `X` is a 3D point in world coordinates.
So whenever you calculate the camera pose from a new pair of views, the camera pose is always relative to the first view of this pair. So let's say you have `n`different views. You could e.g. always calculate the essential matrix w.r.t the first view and then, from the first set of triangulated keypoints, you should ideally be able to estimate the correct scale of all other views w.r.t the first (unit) translation.Fri, 24 Jul 2015 04:17:32 -0500http://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/?comment=66961#post-id-66961Comment by Sai Kondaveeti for <p>It depends whether you have knowledge about an objects absolute metric size (e.g., the basewidth of a stereo camera setup). The Essential Matrix <code>E</code> decomposes into rotation <code>R</code> and translation <code>t</code>, but notice that there are four possible solutions, but only one geometrically feasible one (see <a href="http://isit.u-clermont1.fr/~ab/Classes/DIKU-3DCV2/Handouts/Lecture16.pdf">here</a>, slide 8). After doing the cheirality check, i.e., all 3D points must be in front of the two cameras, you get the feasible solution. </p>
<p>The translation vector <code>t</code> (and rotation matrix <code>R</code>) yield your camera matrix <code>P = [R|t]</code>, where <code>R</code> corresponds to the camera orientation in world coordinates and <code>t</code> is the position of the world origin in camera coordinates. Therefore, to obtain the camera origin <code>c</code> in world coordinates, you calculate: </p>
<pre><code>c = -inverse(R)*t
</code></pre>
<p>As you noticed in your EDIT 1: The translation vector is normalized to unit. This is due to the SVD, which always returns a solution normalized to unit. Therefore, it is impossible to retrieve an absolute metric translation (in m, mm or whatever) without any additional knowledge of the absolute dimensions of an observed object, but you only obtain a correct solution <em>up-to-scale</em>. </p>
<p>Edit: you might also take a look at this <a href="http://answers.opencv.org/question/18565/pose-estimation-produces-wrong-translation-vector/">post</a> for the calculation of the four possible solutions.</p>
http://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/?comment=66966#post-id-66966Ok, So essentially the scale factor is constant. But then getting some `[R|t]` instead of `[identity|0]` for static scene , should either be calculation mistake or may be due to SVD . Is there any reason why this is happening? Programmatically I am solving this by ignoring the static frame as the `[R|t]` had certain values when Frame1= Frame2 . Thank you very much, you have pretty much cleared all my doubts and also added finer details for implementation like checking the feasible rotations and translations.Fri, 24 Jul 2015 04:43:34 -0500http://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/?comment=66966#post-id-66966Comment by takahashi for <p>It depends whether you have knowledge about an objects absolute metric size (e.g., the basewidth of a stereo camera setup). The Essential Matrix <code>E</code> decomposes into rotation <code>R</code> and translation <code>t</code>, but notice that there are four possible solutions, but only one geometrically feasible one (see <a href="http://isit.u-clermont1.fr/~ab/Classes/DIKU-3DCV2/Handouts/Lecture16.pdf">here</a>, slide 8). After doing the cheirality check, i.e., all 3D points must be in front of the two cameras, you get the feasible solution. </p>
<p>The translation vector <code>t</code> (and rotation matrix <code>R</code>) yield your camera matrix <code>P = [R|t]</code>, where <code>R</code> corresponds to the camera orientation in world coordinates and <code>t</code> is the position of the world origin in camera coordinates. Therefore, to obtain the camera origin <code>c</code> in world coordinates, you calculate: </p>
<pre><code>c = -inverse(R)*t
</code></pre>
<p>As you noticed in your EDIT 1: The translation vector is normalized to unit. This is due to the SVD, which always returns a solution normalized to unit. Therefore, it is impossible to retrieve an absolute metric translation (in m, mm or whatever) without any additional knowledge of the absolute dimensions of an observed object, but you only obtain a correct solution <em>up-to-scale</em>. </p>
<p>Edit: you might also take a look at this <a href="http://answers.opencv.org/question/18565/pose-estimation-produces-wrong-translation-vector/">post</a> for the calculation of the four possible solutions.</p>
http://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/?comment=67005#post-id-67005I think the problem is that the Fundamental matrix is defined by the epipolar geometry, but I can't recall the details right now and what could probably be the cause of your problem.Fri, 24 Jul 2015 11:52:20 -0500http://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/?comment=67005#post-id-67005Comment by Sai Kondaveeti for <p>It depends whether you have knowledge about an objects absolute metric size (e.g., the basewidth of a stereo camera setup). The Essential Matrix <code>E</code> decomposes into rotation <code>R</code> and translation <code>t</code>, but notice that there are four possible solutions, but only one geometrically feasible one (see <a href="http://isit.u-clermont1.fr/~ab/Classes/DIKU-3DCV2/Handouts/Lecture16.pdf">here</a>, slide 8). After doing the cheirality check, i.e., all 3D points must be in front of the two cameras, you get the feasible solution. </p>
<p>The translation vector <code>t</code> (and rotation matrix <code>R</code>) yield your camera matrix <code>P = [R|t]</code>, where <code>R</code> corresponds to the camera orientation in world coordinates and <code>t</code> is the position of the world origin in camera coordinates. Therefore, to obtain the camera origin <code>c</code> in world coordinates, you calculate: </p>
<pre><code>c = -inverse(R)*t
</code></pre>
<p>As you noticed in your EDIT 1: The translation vector is normalized to unit. This is due to the SVD, which always returns a solution normalized to unit. Therefore, it is impossible to retrieve an absolute metric translation (in m, mm or whatever) without any additional knowledge of the absolute dimensions of an observed object, but you only obtain a correct solution <em>up-to-scale</em>. </p>
<p>Edit: you might also take a look at this <a href="http://answers.opencv.org/question/18565/pose-estimation-produces-wrong-translation-vector/">post</a> for the calculation of the four possible solutions.</p>
http://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/?comment=66956#post-id-66956Thanks. For my Edit 2 : When finding the scaling factor **lambda**, `(sx2, sy2, 1) = K(R|lambda*T)(X,Y,Z)` The (X,Y,Z) are coordinates in Frame2 and sx2,sy2 are the pixel coordinates in Frame1 ? Assuming I have (X,Y,Z) for different keypoints , I will get a set of **K** values, for which I will take a median to estimate it. So Is this scaling factor changing or it is constant throughout motion?
When I was doing the above method for a static scene, I was getting the some `[R|t]` instead of `[identity|0]` (The values of `[R|t]` was same for all static scenes incidentally ) , is that due to calculation errors or will the scaling factor make it nearly `[identity|0]`?Fri, 24 Jul 2015 03:59:07 -0500http://answers.opencv.org/question/66839/units-of-rotation-and-translation-from-essential-matrix/?comment=66956#post-id-66956