# Using triangulatePoints() and projectPoints() with stereo cameras

Hi, I have a stereo camera setup for which I am doing relative pose estimation of one camera w.r.t the other using the essential matrix. Once I computed the pose, I want to refine it; so I was thinking of triangulating the matched points into the 3D space, reprojecting them backwards and computing the reprojection error. Eventually I would also like to use solvePnP to refine my pose (I was trying bundle adjustment, but the results were awry: possibly because my extrinsics change over time)

The thing I am confused by is what R and t is used by the triangulatePoints() and projectPoints() functions. When I am passing the projection matrices to triangulatePoints(), do I just pass K1[I|0] for the first camera and K2[R|t] for the second? And does projectPoints() require the R and t between the camera and the object or camera1 to camera2? Any suggestions would be helpful.

Thanks!

edit retag close merge delete

Sort by ยป oldest newest most voted

You are trying to mix two different approaches for pose estimation: 2D-to-2D ( via essential matrix) and 2D-to-3D using triangulated 3D points. solvePnP is an algorithm for the latter approach. Bundle adjustment basically minimizes the reprojection error over the set of poses and 3D points, so changing extrinsics are actually essential for bundle adjustment. The tutorial here and here gives a very good overview on the topic of visual odometry.

Usually, the 3D coordinate frame is defined such that its origin is at the center of the first camera so in this sense, yes, K1[I|0] for the first camera and K2[R|t] for the second camera is correct. The first time you triangulate the points, the obtained 3D points are expressed in the 3D coordinate frame defined by the first camera. Therefore, when you want to reproject the 3D points in e.g. view 3 in order to calculate the reprojection error, you need to know the pose of the 3rd camera w.r.t the first view.

Hope this helps.

more

Thanks for the reply. Just to confirm: In my application, there are two cameras but camera2 does not stay in one place: hence the issue of estimating its position relative to camera1 (and subsequently trying to stabilize it at a certain baseline: part 2). As the scale keeps changing for every pair of frame I analyze with the essential matrix, I thought bundle adjustment would not work. Do you still think BA would work for my system? Perhaps I used the word extrinsics wrongly: I meant the baseline would not be constant either.

( 2015-08-10 03:01:05 -0600 )edit

The camera extrinsics are the pose of the camera w.r.t a (arbitrary user-defined) 3D coordinate system, so for a 2-camera setup with the 3D coordinate system defined at the center of the first camera, the translation part of the second camera's extrinsics corresponds to the baseline when speaking in terms of a stereo setup. You're right about the change of scale, 2D-to-2D via essential matrix normalizes the translation vector to unit for each pair of views. So what you could do is using 2D-to-2D to get an initial estimate of pose, then you use this estimate to triangulate the points. Once you have established this inital set of pose and 3D points, you can use PnP for further pose estimation (with correct scale). Then you can use BA to minimize the reprojection error.

( 2015-08-10 03:11:30 -0600 )edit

I see. That makes things clearer. A follow-up question: You said "you can use PnP for further pose estimation (with correct scale)." I don't have any objects of known size etc. in the scene to recover the scale information: at the end of the day, I will be using another sensor (GPS/IMU) to get metric translations (those sensors not being precise enough by themselves to do this job). So throughout the whole essential matrix->triangulation->PnP->BA pipeline, I would still be working with an arbitrary scale. Would that cause any issues for my end goal?

( 2015-08-10 03:18:04 -0600 )edit

You'll get a correct estimation/ scene reconstruction up-to-scale. So the relative translation between the cameras and the positions of the triangulated 3D points will be correct with respect to the translation from the first to second camera, which is assumed to have length 1. You will obtain an up-to-scale metric reconstruction, which unfortunately does not have any absolute information. BA will work without that information, as it refines your pose and 3D points up-to-scale as well. To get the absolute scale, you either need to have previous knowledge about the size of an object in the scene or as you plan, you could use the information from GPS/IMU. This can be included after the BA by multiplying all your metric quantities with the GPS/IMU estimate of the first translation vector.

( 2015-08-10 03:26:22 -0600 )edit

Got it. Thanks a lot, that was very helpful!

( 2015-08-10 03:42:42 -0600 )edit

Official site

GitHub

Wiki

Documentation