Ask Your Question

Stereo Calibration - what is optimised?

asked 2016-02-24 23:18:19 -0500

MattClarkson gravatar image

updated 2016-02-25 01:47:25 -0500

In the stereo camera calibration code, I'm a bit confused as to what parameters are optimised. If we ignore intrinsic parameters for now, is it:

(a) The left (or equivalently right) extrinsic (chessboard to camera) matrices and 6 parameters for the left-to-right transformation? If so, what are we taking the median of?

(b) The left and right extrinsic matrices, and then we take the median of the implied left-to-right transformation?

Any help would be much appreciated.



edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2016-02-25 02:57:31 -0500

AJW gravatar image

Disclaimer: I'm not that familiar with the function, and don't have time to dig into it right now :( so this will be a bit speculative. Hopefully someone else weighs in with more authority.

One of the outputs of stereoCalibrate is the (R,t) rigid transform relating the two camera coordinate frames, so I would guess that these are directly optimised. The cameras then need to be related to the target (chessboard) coordinate sytem, so there is probably a collection of transformations from one camera's coordinate system to the target (or vice versa).

From 'Learning OpenCV', I think where the median comes in is the generation of the initial guess for (R,t) between cameras, that is then refined during the optimization process. In a single-camera calibration, you get a transform relating camera and target, and by combining these from each camera, you can get an estimate of (R,t). But remember that single camera calibration gives a transform between the camera and every distinct target pose, corresponding to an image. So each pair of stereo images implies some estimate of (R,t). I think you then take the median of these estimates to get the initial guess.

I'm pretty sure you wouldn't do (b) because the optimization would be over-parameterized; more transformations than are needed to model the system. I don't think medians would be taken after the optimization, as the new values would not have been explicitly optimized, so may not agree nicely with e.g. the intrinsics that were used.

Hope this helps in some small way....

edit flag offensive delete link more



thanks for your answer. I agree, so lets discard option (b). However, if I look through the code I see:

    // we optimize for the inter-camera R(3),t(3), then, optionally,
// for intrinisic parameters of each camera ((fx,fy,cx,cy,k1,k2,p1,p2) ~ 8 parameters).
nparams = 6*(nimages+1) + (recomputeIntrinsics ? NINTRINSIC*2 : 0);

so, I think that we have option:

(c) Use cvFindExtrinsicCameraParams2 to find extrinsic parameters for each camera. This gives an implied (R,t) between the left and right camera. Take the median of the implied (R,t) transformations between the cameras as a starting estimate. Optimise all extrinsic parameter. So for 10 images, thats 66 parameters, (ignoring intrinsic for now).

Which still sounds like a lot of parameters.



MattClarkson gravatar imageMattClarkson ( 2016-02-25 06:57:35 -0500 )edit

"A lot" has to be considered relative to the problem model.

You have a set of corner pixel coordinates, and the 3D coordinates of the corresponding physical points, expressed in the target's local coordinate system. The goal is to (virtually) project those 3D points onto the images, and compare where they landed with where you actually detected the corners in the images.

Now, in order to project 3D points onto the cameras, they need to be expressed in the camera's coordinate system. So you need a rigid transform from target system to camera. In each stereo image pair, the target is physically in a different pose, so we need a different transformation for each pose to be able to use the associated data. So that's 10 (R,t) transforms for 10 images, and one more to link the cameras.

AJW gravatar imageAJW ( 2016-02-26 21:18:03 -0500 )edit

Admittedly we don't really care about the target-to-camera transformations once we have the calibration, but they are needed during the process to model the geometry of the system.

AJW gravatar imageAJW ( 2016-02-26 21:25:07 -0500 )edit

Question Tools

1 follower


Asked: 2016-02-24 23:18:19 -0500

Seen: 307 times

Last updated: Feb 25 '16