I'm trying to use solvePnPRansac to calculate the motion of a stereo camera from one frame/time instance to the next. I detect features in the first frame and track them to the next frame. I'm also using a stereo camera that comes with a SDK which provides me with the corresponding 3D coordinates of the detected features. **These 3D points are wrt. the camera coordinate system**. In other words, the 2D/3D-points in the two consecutive frames are corresponding to the same features, but if the camera moves between the frames, the coordinates change (even the 3D points, since they are relative to the camera origin).
I believe that the 3D input points of solvePnPRansac should be wrt a world frame, but since I don't have a world frame, I try to do the following:
1) For the very first frame: I set the initial camera pose as the world frame, since I need a constant reference for computation of relative movement. This means that the 3D points calculated in this frame now equals the world points, and that the movement of the camera is relative to the initial camera pose.
2) Call solvePnPRansac with the world points from the first frame together with the 2D features detected in the second frame as inputs. It returns rvec and tvec
**Now for my first question:** Is tvec the vector from the camera origin (/the second frame) to the world origin (/the first frame), given in the camera's coordinates system?
**Second question:** I want the vector from the world frame to the camera/second frame given in world frame coordinates (this should be equal to the translation of the camera relative to the original pose=world frame), so I need to use **translation = -(R)^T * tvec**, where R is the rotation matrix given by rvec?
Now I'm a little confused as to which 3D points I should use in the further calculations. Should I transform the 3D points detected in the second frame (which is given wrt the camera) to the world frame? If I combine the tvec and rvec into a homogeneous-transformation matrix T (which would represent the homogeneous transformation from the world frame to the second frame), the transformation should be
**3Dpoints_at_frame2_in_worldcoordinates = T^(-1) * 3Dpoints_at_frame2_in_cameracoordinates**
If I do this, I can capture a new image (third frame), track the 2D features detected in the second frame to the third frame, compute the corresponding 3D points (which is given wrt the third frame) and call solvePnPRansac with "3Dpoints_at_frame2_in_worldcoordinates" and the 2D features at the third frame as input. The returned rvec and tvec represents a conversion of world points to the third frame, i. e. if I use the same formula as in my second question I would get the absolute movement from the world frame to the third frame. And if I create a new homogeneous-transformation matrix I can convert the 3D points at the third frame into world coordinates and use these points in the next iteration (fourth frame). **Does this make any sense?**
Read the following:
[Visual Odometry: Part 1](https://www.ifi.uzh.ch/dam/jcr:5759a719-55db-4930-8051-4cc534f812b1/VO_Part_I_Scaramuzza.pdf)
and
When I extract the 3D coordinates to a corresponding 2D feature, which I extract from the 3D image calculated by `cv::reprojectImageTo3D()` using the syntax `Point3f feature_point_3D = image_3D.at<Point3f>(u, v)`, where (u, v) is the pixel coordinates (x, y) of the 2D feature, and image_3D is of type `CV_32FC3`, I had to switch the indexing to (v, u), i. e. `image_3D.at<Point3f>(v, u)`.
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188717#post-id-188717I see, thanks. And then I guess that it's the same for `cv::projectPoints()`, using P1 and a zeroed `distCoeffs` matrix? I would like to project the 3D points back into 2D to see if they match the original 2D points.
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=188691#post-id-188691struct DUO_STEREO
{
double M1[9], M2[9]; // 3x3 - Camera matrices (left, right)
double D1[8], D2[8]; // 1x8 - Camera distortion parameters (left, right)
double R[9]; // 3x3 - Rotation between left and right camera
double T[3]; // 3x1 - Translation vector between left and right camera
double R1[9], R2[9]; // 3x3 - Rectified rotation matrices (left, right)
double P1[12], P2[12]; // 3x4 - Rectified projection matrices (left, right)
double Q[16]; // 4x4 - Disparity to depth mapping matrix
};
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186735#post-id-186735Nice, thanks!
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186605#post-id-186605[Code gist](https://gist.github.com/llschloesser/5ce412e652b5c126e18c4c40c9d31185)
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186597#post-id-186597Yes, I'm using the Duo3D M. I agree that by accumulating transformations frame after frame I need to expect some drift. But what I meant was that if I just print out the tvec returned from solvePnPRansac for each iteration, i. e. the local translation between two consequtive frames, then shouldn't it stay approximately at (0, 0, 0) if the points are accurate enough?
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=186364#post-id-186364I will look into the accuracy of the 3D points, maybe try to visualize them in some way. I've also asked the developers of the stereo camera about the accuracy of the 3D points, since I get them from a function in an API/SDK provided by them. The baseline of the camera is only 30 mm with a stated operating range of 2.5 meters, so I try to keep the scene close to the lenses (<0.5 m).
But, just to make sure and to have some sort of goal: The tvec output from solvePnPRansac should stay close to (0, 0, 0) if I don't move the camera?
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=185815#post-id-185815*I can compute what rvec3/tvec3 equals from the formulas given in the documentation.
BUT, there is one thing that I've never quite understood, and would appreciate if you could help me with my confusion (because this affects the results from composeRT):
The rvec and tvec returned by solvePnPRansac, what exactly do they represent? What I've believed:
- rvec/the rotation matrix formed by rvec represents the rotation *from* the previous coordinate frame *to* the current frame.
- tvec is the vector *from* the origin of the current frame *to* the origin of the previous frame, with coordinates given in the *current* frame/coordinate system. I. e. t^n_{n, n-1}, if that notation makes sense to you.
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=185812#post-id-185812Thanks again.
I've actually been doing the same inlier detection step as they do in the paper (been kind of following this approach, where the same inlier detection step is done in step 6: https://avisingh599.github.io/vision/visual-odometry-full/). However, I noticed that I've been doing it on the 2D feature points, and I understand that I need to detect inliers among the 3D points. As of now, I only have a constant number as the "distance error threshold" for determining if a point is an inlier or outlier, but I will look into the more "dynamical" approach described in the paper (eq. 10).
http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=185716#post-id-185716Thanks! Reading the articles helped me with further understanding of the concept! Especially regarding what 2D-3D points to use when solving the PnP.
So now I'm using previous frame 3D points in camera coordinates and current frame 2D points at each iteration in solvePnP. The results seem to be ok, however they contain lots of noise, and that makes it hard to tell if the global transform is correct.
If I construct a transformation matrix like this with each R and t (R gotten from Rodrigues and t=tvec): T_current = [R t; 0 1], and I have my T_global (which initially is a 4x4 identity matrix) is this the correct update formula: T_global = T_global * T_current? Does T_global now represent the transformation from the initial frame to the current frame? Is T_current = [R t; 0 1] correct?Wed, 28 Feb 2018 12:01:51 -0600http://answers.opencv.org/question/185671/stereo-camera-pose-estimation-from-solvepnpransac-using-3d-points-given-wrt-the-camera-coordinate-system/?comment=185716#post-id-185716