OpenCV Q&A Forum - RSS feedhttp://answers.opencv.org/questions/OpenCV answersenCopyright <a href="http://www.opencv.org">OpenCV foundation</a>, 2012-2018.Fri, 26 Feb 2016 21:25:07 -0600Stereo Calibration - what is optimised?http://answers.opencv.org/question/88506/stereo-calibration-what-is-optimised/In the stereo camera calibration code, I'm a bit confused as to what parameters are optimised. If we ignore intrinsic parameters for now, is it:
(a) The left (or equivalently right) extrinsic (chessboard to camera) matrices and 6 parameters for the left-to-right transformation? If so, what are we taking the median of?
(b) The left and right extrinsic matrices, and then we take the median of the implied left-to-right transformation?
Any help would be much appreciated.
Thanks,
MattWed, 24 Feb 2016 23:18:19 -0600http://answers.opencv.org/question/88506/stereo-calibration-what-is-optimised/Answer by AJW for <p>In the stereo camera calibration code, I'm a bit confused as to what parameters are optimised. If we ignore intrinsic parameters for now, is it:</p>
<p>(a) The left (or equivalently right) extrinsic (chessboard to camera) matrices and 6 parameters for the left-to-right transformation? If so, what are we taking the median of?</p>
<p>(b) The left and right extrinsic matrices, and then we take the median of the implied left-to-right transformation?</p>
<p>Any help would be much appreciated.</p>
<p>Thanks,</p>
<p>Matt</p>
http://answers.opencv.org/question/88506/stereo-calibration-what-is-optimised/?answer=88530#post-id-88530Disclaimer: I'm not that familiar with the function, and don't have time to dig into it right now :( so this will be a bit speculative. Hopefully someone else weighs in with more authority.
One of the outputs of stereoCalibrate is the (R,t) rigid transform relating the two camera coordinate frames, so I would guess that these are directly optimised. The cameras then need to be related to the target (chessboard) coordinate sytem, so there is probably a collection of transformations from one camera's coordinate system to the target (or vice versa).
From 'Learning OpenCV', I think where the median comes in is the generation of the *initial guess* for (R,t) between cameras, that is then refined during the optimization process. In a single-camera calibration, you get a transform relating camera and target, and by combining these from each camera, you can get an estimate of (R,t). But remember that single camera calibration gives a transform between the camera and *every distinct target pose*, corresponding to an image. So *each pair* of stereo images implies some estimate of (R,t). I think you then take the median of these estimates to get the initial guess.
I'm pretty sure you wouldn't do (b) because the optimization would be over-parameterized; more transformations than are needed to model the system. I don't think medians would be taken after the optimization, as the new values would not have been explicitly optimized, so may not agree nicely with e.g. the intrinsics that were used.
Hope this helps in some small way....
Thu, 25 Feb 2016 02:57:31 -0600http://answers.opencv.org/question/88506/stereo-calibration-what-is-optimised/?answer=88530#post-id-88530Comment by AJW for <p>Disclaimer: I'm not that familiar with the function, and don't have time to dig into it right now :( so this will be a bit speculative. Hopefully someone else weighs in with more authority.</p>
<p>One of the outputs of stereoCalibrate is the (R,t) rigid transform relating the two camera coordinate frames, so I would guess that these are directly optimised. The cameras then need to be related to the target (chessboard) coordinate sytem, so there is probably a collection of transformations from one camera's coordinate system to the target (or vice versa).</p>
<p>From 'Learning OpenCV', I think where the median comes in is the generation of the <em>initial guess</em> for (R,t) between cameras, that is then refined during the optimization process. In a single-camera calibration, you get a transform relating camera and target, and by combining these from each camera, you can get an estimate of (R,t). But remember that single camera calibration gives a transform between the camera and <em>every distinct target pose</em>, corresponding to an image. So <em>each pair</em> of stereo images implies some estimate of (R,t). I think you then take the median of these estimates to get the initial guess.</p>
<p>I'm pretty sure you wouldn't do (b) because the optimization would be over-parameterized; more transformations than are needed to model the system. I don't think medians would be taken after the optimization, as the new values would not have been explicitly optimized, so may not agree nicely with e.g. the intrinsics that were used.</p>
<p>Hope this helps in some small way....</p>
http://answers.opencv.org/question/88506/stereo-calibration-what-is-optimised/?comment=88726#post-id-88726Admittedly we don't really care about the target-to-camera transformations once we have the calibration, but they are needed during the process to model the geometry of the system.Fri, 26 Feb 2016 21:25:07 -0600http://answers.opencv.org/question/88506/stereo-calibration-what-is-optimised/?comment=88726#post-id-88726Comment by AJW for <p>Disclaimer: I'm not that familiar with the function, and don't have time to dig into it right now :( so this will be a bit speculative. Hopefully someone else weighs in with more authority.</p>
<p>One of the outputs of stereoCalibrate is the (R,t) rigid transform relating the two camera coordinate frames, so I would guess that these are directly optimised. The cameras then need to be related to the target (chessboard) coordinate sytem, so there is probably a collection of transformations from one camera's coordinate system to the target (or vice versa).</p>
<p>From 'Learning OpenCV', I think where the median comes in is the generation of the <em>initial guess</em> for (R,t) between cameras, that is then refined during the optimization process. In a single-camera calibration, you get a transform relating camera and target, and by combining these from each camera, you can get an estimate of (R,t). But remember that single camera calibration gives a transform between the camera and <em>every distinct target pose</em>, corresponding to an image. So <em>each pair</em> of stereo images implies some estimate of (R,t). I think you then take the median of these estimates to get the initial guess.</p>
<p>I'm pretty sure you wouldn't do (b) because the optimization would be over-parameterized; more transformations than are needed to model the system. I don't think medians would be taken after the optimization, as the new values would not have been explicitly optimized, so may not agree nicely with e.g. the intrinsics that were used.</p>
<p>Hope this helps in some small way....</p>
http://answers.opencv.org/question/88506/stereo-calibration-what-is-optimised/?comment=88725#post-id-88725"A lot" has to be considered relative to the problem model.
You have a set of corner pixel coordinates, and the 3D coordinates of the corresponding physical points, expressed in the target's local coordinate system. The goal is to (virtually) project those 3D points onto the images, and compare where they landed with where you actually detected the corners in the images.
Now, in order to project 3D points onto the cameras, they need to be expressed in the *camera's* coordinate system. So you need a rigid transform from target system to camera. In each stereo image pair, the target is physically in a different pose, so we need a *different* transformation for *each pose* to be able to use the associated data. So that's 10 (R,t) transforms for 10 images, and one more to link the cameras.Fri, 26 Feb 2016 21:18:03 -0600http://answers.opencv.org/question/88506/stereo-calibration-what-is-optimised/?comment=88725#post-id-88725Comment by MattClarkson for <p>Disclaimer: I'm not that familiar with the function, and don't have time to dig into it right now :( so this will be a bit speculative. Hopefully someone else weighs in with more authority.</p>
<p>One of the outputs of stereoCalibrate is the (R,t) rigid transform relating the two camera coordinate frames, so I would guess that these are directly optimised. The cameras then need to be related to the target (chessboard) coordinate sytem, so there is probably a collection of transformations from one camera's coordinate system to the target (or vice versa).</p>
<p>From 'Learning OpenCV', I think where the median comes in is the generation of the <em>initial guess</em> for (R,t) between cameras, that is then refined during the optimization process. In a single-camera calibration, you get a transform relating camera and target, and by combining these from each camera, you can get an estimate of (R,t). But remember that single camera calibration gives a transform between the camera and <em>every distinct target pose</em>, corresponding to an image. So <em>each pair</em> of stereo images implies some estimate of (R,t). I think you then take the median of these estimates to get the initial guess.</p>
<p>I'm pretty sure you wouldn't do (b) because the optimization would be over-parameterized; more transformations than are needed to model the system. I don't think medians would be taken after the optimization, as the new values would not have been explicitly optimized, so may not agree nicely with e.g. the intrinsics that were used.</p>
<p>Hope this helps in some small way....</p>
http://answers.opencv.org/question/88506/stereo-calibration-what-is-optimised/?comment=88571#post-id-88571Hi AJW,
thanks for your answer. I agree, so lets discard option (b). However, if I look through the code I see:
// we optimize for the inter-camera R(3),t(3), then, optionally,
// for intrinisic parameters of each camera ((fx,fy,cx,cy,k1,k2,p1,p2) ~ 8 parameters).
nparams = 6*(nimages+1) + (recomputeIntrinsics ? NINTRINSIC*2 : 0);
so, I think that we have option:
(c) Use cvFindExtrinsicCameraParams2 to find extrinsic parameters for each camera. This gives an implied (R,t) between the left and right camera. Take the median of the implied (R,t) transformations between the cameras as a starting estimate. Optimise all extrinsic parameter. So for 10 images, thats 66 parameters, (ignoring intrinsic for now).
Which still sounds like a lot of parameters.
Thanks
MattThu, 25 Feb 2016 06:57:35 -0600http://answers.opencv.org/question/88506/stereo-calibration-what-is-optimised/?comment=88571#post-id-88571