OpenCV Q&A Forum - RSS feedhttp://answers.opencv.org/questions/OpenCV answersenCopyright <a href="http://www.opencv.org">OpenCV foundation</a>, 2012-2018.Thu, 30 Aug 2018 21:38:59 -0500Stereo camera pose estimation from solvePnPRansac using 3D points given wrt. the camera coordinate systemhttp://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/I know that there exists many posts regarding pose estimation using solvePnP/solvePnPRansac, and I've read most of them, but my case differs slightly from what seems to be the standard scenario, and even if I think I've got it I just can't seem to get it to work and would like someone to correct me if I'm doing anything wrong. This post became quite long, but please bear with me.
I'm trying to use solvePnPRansac to calculate the motion of a stereo camera from one frame/time instance to the next. I detect features in the first frame and track them to the next frame. I'm also using a stereo camera that comes with a SDK which provides me with the corresponding 3D coordinates of the detected features. **These 3D points are wrt. the camera coordinate system**. In other words, the 2D/3D-points in the two consecutive frames are corresponding to the same features, but if the camera moves between the frames, the coordinates change (even the 3D points, since they are relative to the camera origin).
I believe that the 3D input points of solvePnPRansac should be wrt a world frame, but since I don't have a world frame, I try to do the following:
1) For the very first frame: I set the initial camera pose as the world frame, since I need a constant reference for computation of relative movement. This means that the 3D points calculated in this frame now equals the world points, and that the movement of the camera is relative to the initial camera pose.
2) Call solvePnPRansac with the world points from the first frame together with the 2D features detected in the second frame as inputs. It returns rvec and tvec
**Now for my first question:** Is tvec the vector from the camera origin (/the second frame) to the world origin (/the first frame), given in the camera's coordinates system?
**Second question:** I want the vector from the world frame to the camera/second frame given in world frame coordinates (this should be equal to the translation of the camera relative to the original pose=world frame), so I need to use **translation = -(R)^T * tvec**, where R is the rotation matrix given by rvec?
Now I'm a little confused as to which 3D points I should use in the further calculations. Should I transform the 3D points detected in the second frame (which is given wrt the camera) to the world frame? If I combine the tvec and rvec into a homogeneous-transformation matrix T (which would represent the homogeneous transformation from the world frame to the second frame), the transformation should be
**3Dpoints_at_frame2_in_worldcoordinates = T^(-1) * 3Dpoints_at_frame2_in_cameracoordinates**
If I do this, I can capture a new image (third frame), track the 2D features detected in the second frame to the third frame, compute the corresponding 3D points (which is given wrt the third frame) and call solvePnPRansac with "3Dpoints_at_frame2_in_worldcoordinates" and the 2D features at the third frame as input. The returned rvec and tvec represents a conversion of world points to the third frame, i. e. if I use the same formula as in my second question I would get the absolute movement from the world frame to the third frame. And if I create a new homogeneous-transformation matrix I can convert the 3D points at the third frame into world coordinates and use these points in the next iteration (fourth frame). **Does this make any sense?**
Any answers or attempts on pointing out things I've might misunderstood (or understood, for that matter) are greatly appreciated!Wed, 28 Feb 2018 04:49:20 -0600http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/Answer by Der Luftmensch for <div class="snippet"><p>I know that there exists many posts regarding pose estimation using solvePnP/solvePnPRansac, and I've read most of them, but my case differs slightly from what seems to be the standard scenario, and even if I think I've got it I just can't seem to get it to work and would like someone to correct me if I'm doing anything wrong. This post became quite long, but please bear with me.</p>
<p>I'm trying to use solvePnPRansac to calculate the motion of a stereo camera from one frame/time instance to the next. I detect features in the first frame and track them to the next frame. I'm also using a stereo camera that comes with a SDK which provides me with the corresponding 3D coordinates of the detected features. <strong>These 3D points are wrt. the camera coordinate system</strong>. In other words, the 2D/3D-points in the two consecutive frames are corresponding to the same features, but if the camera moves between the frames, the coordinates change (even the 3D points, since they are relative to the camera origin). </p>
<p>I believe that the 3D input points of solvePnPRansac should be wrt a world frame, but since I don't have a world frame, I try to do the following:</p>
<p>1) For the very first frame: I set the initial camera pose as the world frame, since I need a constant reference for computation of relative movement. This means that the 3D points calculated in this frame now equals the world points, and that the movement of the camera is relative to the initial camera pose. </p>
<p>2) Call solvePnPRansac with the world points from the first frame together with the 2D features detected in the second frame as inputs. It returns rvec and tvec</p>
<p><strong>Now for my first question:</strong> Is tvec the vector from the camera origin (/the second frame) to the world origin (/the first frame), given in the camera's coordinates system?</p>
<p><strong>Second question:</strong> I want the vector from the world frame to the camera/second frame given in world frame coordinates (this should be equal to the translation of the camera relative to the original pose=world frame), so I need to use <strong>translation = -(R)^T * tvec</strong>, where R is the rotation matrix given by rvec?</p>
<p>Now I'm a little confused as to which 3D points I should use in the further calculations. Should I transform the 3D points detected in the second frame (which is given wrt the camera) to the world frame? If I combine the tvec and rvec into a homogeneous-transformation matrix T (which would represent the homogeneous transformation from the world frame to the second frame), the transformation should be
<strong>3Dpoints_at_frame2_in_worldcoordinates = T^(-1) * 3Dpoints_at_frame2_in_cameracoordinates</strong></p>
<p>If I do this, I can capture a new image (third frame), track the 2D features detected in the second frame to the third frame, compute the corresponding 3D points (which is given wrt the third frame) and call solvePnPRansac with "3Dpoints_at_frame2_in_worldcoordinates" and the ...<span class="expander"> <a>(more)</a></span></p></div> http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?answer=185679#post-id-185679Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.
Read the following:
[Visual Odometry: Part 1](https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf)
and
[Visual Odometry: Part 2](https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf)Wed, 28 Feb 2018 05:54:37 -0600http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?answer=185679#post-id-185679Comment by cathy for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=198436#post-id-198436Hi, bendikiv, I'm confused as to which 3D points I should use in the further calculations also. The 3D points in stereo camera is calculated in camera coordinate, so shoud I translate it to world coordinate using the PnP solved result R t? Thank you! Thu, 30 Aug 2018 21:38:59 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=198436#post-id-198436Comment by Der Luftmensch for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188852#post-id-188852Nice to hear you got it working. I've always gone with i and j for indexing images. For some reason it seems clearer than y,x or r,c or u,v. Remember that `at()` takes args as (row, col) which is (y, x) or (i, j).Fri, 06 Apr 2018 19:25:12 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188852#post-id-188852Comment by bendikiv for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188829#post-id-188829It works! The concatenated position estimates drifts quite alot, but the whole thing works now, finally! The (major) fix, if it is of interest:
When I extract the 3D coordinates to a corresponding 2D feature, which I extract from the 3D image calculated by `cv::reprojectImageTo3D()` using the syntax `Point3f feature_point_3D = image_3D.at<Point3f>(u, v)`, where (u, v) is the pixel coordinates (x, y) of the 2D feature, and image_3D is of type `CV_32FC3`, I had to switch the indexing to (v, u), i. e. `image_3D.at<Point3f>(v, u)`.
After I did this the projection of the 3D features back to the image plane were almost identical to the original 2D features, and the following pose estimation returned meaningful results. But it feels like a "quickfix", and I should try to find out why...Fri, 06 Apr 2018 12:03:41 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188829#post-id-188829Comment by bendikiv for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188737#post-id-1887371. Original 2D feature = [73, 149], the projected point from 3D = [149, 73]. Ok, except vector is flipped.
2. But, if original 2D feature = [134, 328], the projection is = [8, 135].
3. And, if original 2D feature = [131, 335], the projection is = [15, 131].
So the x-value of the projection (which should equal the original y-value) is like `x-value mod 320`. The 3D points used as input variables to `cv::projectPoints()` originates from `cv::reprojectImageTo3D()`...do you have any idea what might be wrong?
I've made my Q-matrix equal to the first answer in this post: https://stackoverflow.com/questions/27374970/q-matrix-for-the-reprojectimageto3d-function-in-opencvThu, 05 Apr 2018 11:26:57 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188737#post-id-188737Comment by bendikiv for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188736#post-id-188736Nice. Ok, I seem to be making some progress now, most of the projected 2D points (from the 3D features using `cv::projectPoints()` ) match the detected 2D points, however! Somehow, when I print out the projected 2D points, the x and y values are flipped. So, my original vector of detected 2D features are [x, y], but the projected feature points are returned as [y, x]. If that was all, I could just flip the vector, no problem, but I've found a strange error:
My image resolution is 320x480 (width=x, height=y), but since the projected points are flipped to [y, x], somehow `cv::projectPoints()` seem to think that the y is restricted to 320, and the x to 480, when it's really the opposite. Difficult to explain, so I'll give an example in a comment below:Thu, 05 Apr 2018 11:17:31 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188736#post-id-188736Comment by Der Luftmensch for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188725#post-id-188725That's a good test. Yes, I would zero rvec and tvec. Q and P1 are closely related and share values. If you see a use, you can use `cv::perspectiveTransform()` for a sparse set of (sub-pixel?) points to 3D, rather than `cv::reprojectImageTo3D()`.Thu, 05 Apr 2018 09:06:55 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188725#post-id-188725Comment by bendikiv for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188717#post-id-188717I see, thanks. And then I guess that it's the same for `cv::projectPoints()`, using P1 and a zeroed `distCoeffs` matrix? I would like to project the 3D points back into 2D to see if they match the original 2D points.
And another question about the `cv::projectPoints()`: It takes `rvec` and `tvec` as inputs. The 3D points that I want to project are given in the camera frame. Then `rvec` and `tvec` should only be (0, 0, 0), right? I want to use `cv::projectPoints()` to project the 3D coordinates of the detected 2D features into the image plane, and to see if they match the detected 2D features (which they obviously should).Thu, 05 Apr 2018 08:26:58 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188717#post-id-188717Comment by Der Luftmensch for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188709#post-id-188709Of course, if you are using the DUO SDK, you can't load the values directly as a `cv::Mat`. You'll need to figure out the ordering and which values to drop (the last column). I have [calibDuo](https://github.com/llschloesser/calibDuo) if you want to use that OpenCV for calibrating. Please make it cmake compatible if you can.Thu, 05 Apr 2018 07:55:10 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188709#post-id-188709Comment by Der Luftmensch for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188706#post-id-188706Your images are undistorted and rectified so you need to use P1 for you left image camera matrix. You are right that it wants a 3x3, so you need to do something like `cv::Mat p1_3x3 = p1( cv::Rect( 0, 0, 3, 3 ) );`Thu, 05 Apr 2018 07:49:27 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188706#post-id-188706Comment by bendikiv for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188691#post-id-188691struct DUO_STEREO
{
double M1[9], M2[9]; // 3x3 - Camera matrices (left, right)
double D1[8], D2[8]; // 1x8 - Camera distortion parameters (left, right)
double R[9]; // 3x3 - Rotation between left and right camera
double T[3]; // 3x1 - Translation vector between left and right camera
double R1[9], R2[9]; // 3x3 - Rectified rotation matrices (left, right)
double P1[12], P2[12]; // 3x4 - Rectified projection matrices (left, right)
double Q[16]; // 4x4 - Disparity to depth mapping matrix
};
Till now I've used the M1 as cameraMatrix in solvePnP, and the documentation also says that a 3x3 matrix is required...?Thu, 05 Apr 2018 05:15:14 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188691#post-id-188691Comment by bendikiv for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188690#post-id-188690Shouldn't the cameraMatrix in solvePnP be the 3x3 cameraMatrix from stereoRectify()? Anyhow, I get the stereo parameters from the DUO SDK, which looks like this: (new comment needed)Thu, 05 Apr 2018 05:14:00 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188690#post-id-188690Comment by Der Luftmensch for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188611#post-id-188611Because you are getting depth from stereo, you are probably using undistorted images, so you can either pass an empty or a zeroed matrix for `distCoeffs`. You likely want to use `P1` from `cv::stereoRectify()` for `cameraMatrix`.Wed, 04 Apr 2018 09:45:39 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188611#post-id-188611Comment by bendikiv for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188582#post-id-188582I've become aware of that the cv::solvePnP function, when having non-empty Q and M input variables, assumes that the feature points are extracted from unrectified images? See, I'm rectifying the images before I detect features, and until now I've used the true Q and M-matrices as input variables. I've read somewhere that I should use empty matrices as inputs, but exactly how do I do that? Set all the elements in the matrices to zero? Someone told me to just pass Mat() to the function, but I'm not allowed to do that (assertion error).Wed, 04 Apr 2018 07:04:13 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188582#post-id-188582Comment by bendikiv for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=187340#post-id-187340Thanks! I still struggle with getting sensible results from my algorithm (the tvec from solvePnP still doesn't make sense), but when I get that to work and can start on improving the accuracy I will look into everything you've mentioned :) I think my problems has to do with the features, most likely the 3D points or the correspondence between 2D and 3D points, but it's hard to debug and find out exactly what's wrong.Wed, 21 Mar 2018 15:16:51 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=187340#post-id-187340Comment by Der Luftmensch for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=187339#post-id-187339Worth reading if you haven't already: [Real-Time Stereo Visual Odometry for Autonomous Ground Vehicles](https://www-robotics.jpl.nasa.gov/publications/Andrew_Howard/howard_iros08_visodom.pdf).
You may not require rotation and scale invariance, so keep that in mind and possibly choose faster and simpler algorithms for better results.Wed, 21 Mar 2018 14:45:48 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=187339#post-id-187339Comment by Der Luftmensch for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186824#post-id-186824And in general, corners are better localized than blobs.Thu, 15 Mar 2018 10:10:37 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186824#post-id-186824Comment by Der Luftmensch for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186823#post-id-186823You don't, just pick a value. They used 0.2, meaning that 99% are within 0.6 pixels of true location. I would just use that value and once you have everything working, see what changing it does. Keep in mind that for pyramidal feature finding, those features at the smallest image/largest scale will be the worst localized in original image space. This means that your value of pixel error could vary depending on feature scale. AKAZE is multi-scale and sub-pixel. ORB is sub-pixel and single scale (I think). FAST is single scale and not sub-pixel. AGAST is multi-scale FAST. Just something to keep in mind. Also, remember that the disparity image is discrete and for a sub-pixel feature, you might want some method of interpolating a depth value for more accurate projection to X,Y,Z.Thu, 15 Mar 2018 10:10:15 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186823#post-id-186823Comment by bendikiv for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186814#post-id-186814I see, but how should I model/compute that? The paper suggests Gaussian noise, but in that case, with what parameters? Because I can't really know the real error? Or can I?Thu, 15 Mar 2018 08:23:20 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186814#post-id-186814Comment by Der Luftmensch for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186750#post-id-186750No, it is the error in the localization of the point in pixels. Pixels are a discretization of the world and there is something lost, even for points given sub-pixel values.Wed, 14 Mar 2018 17:51:36 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186750#post-id-186750Comment by bendikiv for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186745#post-id-186745Ok, so numStdDev is an algorithm parameter that is chosen by the user, but what exactly is the pixelError? Is it the average deviation from the mean of all the feature points, for all feature points? Like, compute the mean of all the feature points, then find the standard deviation/how much each feature point differ from the mean, and take the average of that?
...but then again, what is the mean of the feature points?Wed, 14 Mar 2018 17:03:30 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186745#post-id-186745Comment by Der Luftmensch for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186739#post-id-186739pixelError is in pixel standard deviations. numStdDev is the size of the gate in the gating test, which allows setting the type II error. If we use a gate size of three standard deviations we will have a false negative rate of 1%, meaning 99% of true positives are accepted. Narrowing the gate to one standard deviation would give a false negative rate of 33%. The gating test is a fundamental concept in radar tracking, the Bible for which is Design and Analysis of Modern Tracking Systems by Samuel Blackman if you are interested.Wed, 14 Mar 2018 14:49:37 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186739#post-id-186739Comment by bendikiv for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186735#post-id-186735Nice, thanks!
In your new script, however, I have a problem interpreting the pixelError and numStdDev variables. The paper suggests to model the "error delta_e in the pixel positions in the image planes by independant Gaussian noise", and I see that delta_e = pixelError, but what exactly are those variables?Wed, 14 Mar 2018 13:14:35 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186735#post-id-186735Comment by Der Luftmensch for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186605#post-id-186605[Code gist](https://gist.github.com/llschloesser/5ce412e652b5c126e18c4c40c9d31185)
The inlier/max-clique with 3D-error method is the method from the Hirschmuller paper. Once the inlier set (the max-clique) is identified, there is no need for RANSAC and so you'd just use `cv::solvePnP`. Ideally you would write your own version of the algorithm to do weighted least squares based on the uncertainty of the positions of the 3D points. I agree, the local translation between frames should be around 0. You could visualize your depth data in 3D to see how noisy it is. I also recommend using both corners and blobs, not just one or the other. ORB is a good choice for corners and AKAZE for blobs.Tue, 13 Mar 2018 12:54:19 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186605#post-id-186605Comment by bendikiv for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186597#post-id-186597Yes, I'm using the Duo3D M. I agree that by accumulating transformations frame after frame I need to expect some drift. But what I meant was that if I just print out the tvec returned from solvePnPRansac for each iteration, i. e. the local translation between two consequtive frames, then shouldn't it stay approximately at (0, 0, 0) if the points are accurate enough?
Have you uploaded your updated max-clique script anywhere? And what exactly is the "inlier/max-clique with 3D-error method"? Does it include pose estimation (solving the PnP-problem), as an alternative to cv::solvePnP(Ransac)?Tue, 13 Mar 2018 10:58:30 -0500http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186597#post-id-186597Comment by Der Luftmensch for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186389#post-id-186389My understanding is that if you have a static scene and camera, and you are accumulating transformations frame after frame, you will observe a random walk away from 0. Are you using the Duo3D? Since I posted that answer I've improved the code to take into account 3D-error. For frame-to-frame VO, my experience is that the inlier/max-clique with 3D-error method, though a little slower, provides more accurate results than RANSAC PnP, though it has more regular catastrophic errors (probably my fault somewhere). Hirschmuller also suggests an angular rejection test, which should be simple enough to add. It would be no mean feat to do keyframe VO and weighted PnP, either via RANSAC or inlier detection. You might also want to consider `cv::matchGMS` as a 2D-outlier rejection scheme.Fri, 09 Mar 2018 10:55:21 -0600http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186389#post-id-186389Comment by bendikiv for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186364#post-id-186364I will look into the accuracy of the 3D points, maybe try to visualize them in some way. I've also asked the developers of the stereo camera about the accuracy of the 3D points, since I get them from a function in an API/SDK provided by them. The baseline of the camera is only 30 mm with a stated operating range of 2.5 meters, so I try to keep the scene close to the lenses (<0.5 m).
But, just to make sure and to have some sort of goal: The tvec output from solvePnPRansac should stay close to (0, 0, 0) if I don't move the camera?
Side note: I just realized that I've been using your code provided in this answer for the implementation of a max clique approximation! http://answers.opencv.org/question/75437/max-clique-approximation-cvmat-summation/Fri, 09 Mar 2018 03:42:51 -0600http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186364#post-id-186364Comment by Der Luftmensch for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186346#post-id-186346If you are using `cv::solvePnPRansac`, then there is little reason to first do inlier detection (as opposed to outlier rejection) unless you believe that the ratio of inliers to outliers is small. You also might have a very noisy 3D image or one with large errors if the baseline is small. If the 3D positions are not stable then you might observe the strange results you mention. Also, keep in mind that `cv::solvePnPRansac` does no weighting of points. The 3D error increases quadratically as one moves away from the camera and so in reality these points really should be given a much reduced weighting, but OpenCV does not (yet) provide such a capability. Maybe there are some visualizations you can add to better understand what is going on? Try a 3D rigid body transformation of current points?Thu, 08 Mar 2018 18:16:44 -0600http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186346#post-id-186346Comment by bendikiv for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186335#post-id-186335Ok, I opened another question about composeRT and it makes more sense to me now. However, I still wonder about the tvec returned from solvePnPRansac. Yes, rvec and tvec brings points from the model cords to the camera cord. But, if I keep the camera absolute stlll and just print out tvec, shouldn't tvec remain somewhat close to (0, 0, 0) when I use previous 3D points (wrt camera frame) and current corresponding 2D points? Because now it doensn't do that at all (some of it I recognize as noise, but not all of it).Thu, 08 Mar 2018 10:58:11 -0600http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186335#post-id-186335Comment by Der Luftmensch for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=185875#post-id-185875You have the freedom of choice to put these things together however you choose. `rvec` and `tvec` from `cv::solvePnPRansac` bring the model coords to the camera frame, so you will likely end up inverting the rotation matrix and negating the translation. Using the current frame's 2D points and the previous frame's 3D points is necessary for keyframe 3D-data fusion, however, you could swap it around and still do frame-to-frame VO just fine. I hope that helps. Just print out your values (translation and euler angles) for a short sequence and verify it makes sense.Fri, 02 Mar 2018 08:40:52 -0600http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=185875#post-id-185875Comment by bendikiv for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=185815#post-id-185815*I can compute what rvec3/tvec3 equals from the formulas given in the documentation.
BUT, there is one thing that I've never quite understood, and would appreciate if you could help me with my confusion (because this affects the results from composeRT):
The rvec and tvec returned by solvePnPRansac, what exactly do they represent? What I've believed:
- rvec/the rotation matrix formed by rvec represents the rotation *from* the previous coordinate frame *to* the current frame.
- tvec is the vector *from* the origin of the current frame *to* the origin of the previous frame, with coordinates given in the *current* frame/coordinate system. I. e. t^n_{n, n-1}, if that notation makes sense to you.
Is this a correct interpretation of rvec and tvec returned by solvePnPRansac?Thu, 01 Mar 2018 10:24:43 -0600http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=185815#post-id-185815Comment by bendikiv for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=185812#post-id-185812Thanks again.
I've actually been doing the same inlier detection step as they do in the paper (been kind of following this approach, where the same inlier detection step is done in step 6: https://avisingh599.github.io/vision/visual-odometry-full/). However, I noticed that I've been doing it on the 2D feature points, and I understand that I need to detect inliers among the 3D points. As of now, I only have a constant number as the "distance error threshold" for determining if a point is an inlier or outlier, but I will look into the more "dynamical" approach described in the paper (eq. 10).
Just to be sure that I use composeRT correctly: rvec1/tvec1 equals the global transform, while rvec2/tvec2 is the relative one? Which gives me the transformation *from* initial *to* the current frame?Thu, 01 Mar 2018 09:27:38 -0600http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=185812#post-id-185812Comment by Der Luftmensch for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=185732#post-id-185732Use `cv::composeRT` to concatenate your transformations. To reduce noise look into using keyframes where results are aggregated before moving on to the next keyframe. You'll need a concept of stereo error. I recommend reading "[Fast, Unconstrained Camera Motion Estimation from Stereo without Tracking and Robust Statistics](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.10.6052&rep=rep1&type=pdf)" by Heiko Hirschmuller.Wed, 28 Feb 2018 13:54:56 -0600http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=185732#post-id-185732Comment by bendikiv for <p>Do PnP with previous frame 3D points (either camera or world) and current frame 2D points. If the 3D points have world coordinates you are done. If they are local camera coordinates, you must add the computed Rt transform to your global transform.</p>
<p>Read the following:
<a href="https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf">Visual Odometry: Part 1</a>
and
<a href="https://www.researchgate.net/profile/Davide_Scaramuzza2/publication/241638257_Visual_Odometry_Part_II_-_Matching_Robustness_and_Applications/links/02e7e53787f70d39e5000000.pdf">Visual Odometry: Part 2</a></p>
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=185716#post-id-185716Thanks! Reading the articles helped me with further understanding of the concept! Especially regarding what 2D-3D points to use when solving the PnP.
So now I'm using previous frame 3D points in camera coordinates and current frame 2D points at each iteration in solvePnP. The results seem to be ok, however they contain lots of noise, and that makes it hard to tell if the global transform is correct.
If I construct a transformation matrix like this with each R and t (R gotten from Rodrigues and t=tvec): T_current = [R t; 0 1], and I have my T_global (which initially is a 4x4 identity matrix) is this the correct update formula: T_global = T_global * T_current? Does T_global now represent the transformation from the initial frame to the current frame? Is T_current = [R t; 0 1] correct?Wed, 28 Feb 2018 12:01:51 -0600http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=185716#post-id-185716