Interpretation of translational vectors results of camera calibration.

asked 2019-01-25 10:49:20 -0500

sdwarfs gravatar image

updated 2019-02-06 13:03:38 -0500

I currently debug my stereo calibration code. The results of the pose estimation (of single camera calibration) seem to be not accurate enough and the stereo rectification results are quite bad (same features of objects are about 200-400 px off in y direction).

General observations:

  1. The single camera calibration gives me an reprojection RMS of about 0.25 for both cameras, when using about 30 checkerboard calibration images. More images mostly just increase computation time...
  2. The estimated focal length is ~16mm (fx/fy multiplied by pixel size in mm) and the "tvecs" have Z-Values in the order of 0.618 m (I use meters as world units) for a pattern that was ~70cm away from the camera sensor.
  3. The estimated focal length andZ-Values seems to be quite stable, when using different images for calibration.


As far as I know the tvecs[] represent the camera position relative to the pattern in world space coordinates. Now, if I calculate the distance between the cameras in the XY-Plane I get only 2.6 cm, while the cameras a about 8.7 cm away from eachother. You can find the estimated values of tvecs[0] for one pattern below. The pattern was laying at a fixed position and both cameras were fixed too (one image taken from a series of static images, where the pattern was laying flat below both cameras pointing downwards).


I wonder if I correctly interpret the tvecs[] output values? ...or does it maybe need to be rotated first? There is no graphical representation of their meaning, hence it's kind of difficult to know. Maybe you could supply any... I used the below code to estimate roll, pitch and yaw of the cameras... How to adapt this code to actually calculate the distance between both cameras (in case just subtracting them and calculating the length of the vector is wrong).

Calibration-Result: Left Camera

  • RMS: 0.254619
  • fx=16.2095 mm, fy=16.2205 mm, cx=2008.18 px, cy=1049.5 px
  • fovx=46.1164, fovy=24.0126, apect ratio=1.00067, cx=6.92821 mm, cy=3.62076 mm
  • POSITION (tvecs[0]): X=-0.036155, Y=-0.00518067, Z=-0.618701
  • ROTATION (roll, pitch, yaw): 0.00080997, 0.0527651, 3.13971

Calibration-Result: Right Camera:

  • RMS: 0.24088
  • fx=16.2014 mm, fy=16.211 mm, cx=2068.66 px, cy=1012.78 px
  • fovx=46.1298, fovy=24.0285, apect ratio=1.0006, cx=7.13689 mm, cy=3.4941 mm
  • POSITION (tvecs[0]): X=-0.0623798, Y=-0.010667, Z=-0.618921
  • ROTATION (roll, pitch, yaw): -0.00761054, -0.0694482, -0.00125793

General Info:

  • Sony IMX304 sensor (4112 x 3008, 3.45 um Pixel Size)
  • Captured Size: 4000 x 2000 (cropped, xoffset: 56, yoffset: 504)
  • C-Mount-Lense: Quite Long (7-10 cm)
    • Checkerboard pattern: 7 x 9, patch size: about 16.04 mm x 16.01 mm
  • Pattern fixed onto 5mm aluminum plate using adhesive tape

Used Code to get Rotation above:

Mat Rt, R, pos;
Rodrigues(rvec ...
edit retag flag offensive close merge delete



I assume you're estimating lens distortion along with everything else? If you aren't, that would do it.

Tetragramm gravatar imageTetragramm ( 2019-01-28 19:04:54 -0500 )edit

@Telegram: Yes, I use cv::calibrate() for images of each camera to estimate a camera matrix, distortion coefficient, rotation vectors (rvecs[], one set per image) and translation vectors (tvecs[], one set per image). Camera Matrices and Distortion Coefficients are later used in stereoCalibrate() to refine the results while using the CV_CALIB_FIX_INTRINSIC flag. stereoCalibrate() then gives me a new translation vector T = [ -0.0089712, 0.00972611, 0.547779 ] (in meters). Which means roughly X: -0.89 cm, Y: 0.97 cm, Z: 54.78 cm. This doesn't seem accurate enough...

sdwarfs gravatar imagesdwarfs ( 2019-02-05 04:48:19 -0500 )edit

Your depth is short too, yes? How confident are you on the size of your pattern? Is it perhaps slightly larger or smaller than you think?

Tetragramm gravatar imageTetragramm ( 2019-02-05 17:47:55 -0500 )edit

@Tetragram: The measured size should be quite accurate. I measured the whole pattern size and divided it by the number of squares measured in that dimension (~16.04 mm x 16.01 mm). The used PDF would have produced a 20x20 mm pattern, but was printed at 80% scale to fit onto the aluminium plate; so ~16 mm x 16 mm is the expected size. I added the code for my getKnownPositions() function to the posting above and the parameters I used to call it. Maybe there is an error within it which I did not see. So please have a closer look at it...

sdwarfs gravatar imagesdwarfs ( 2019-02-06 12:59:52 -0500 )edit

Are the cameras pointing straight, or are they canted inwards or outwards? IE: ||, /\, or \/ One possibility is that the lenses aren't quite represented by the pinhole model, and so the focal plane is not located with the physical cameras.

I don't think that's likely it, but I'm not actually sure what else it might be.

Tetragramm gravatar imageTetragramm ( 2019-02-06 17:50:44 -0500 )edit

I have two setups: In the one described above they are slightly canted inwards \ /. The guy who mounted them meant this would be preferable (he told me about some angle below 10°). They are arranged to film approximately the same area 70cm below the cameras (approx 50 cm of width). Can't currently measure anything, as I am ~50km away from it right now. However, the distance of the lower end of the lenses shouldn't match too.

sdwarfs gravatar imagesdwarfs ( 2019-02-07 07:19:25 -0500 )edit

So my thought is this. You are using something with real cameras. They have lenses that look something like THIS. But you're approximating it with the pinhole camera model, which looks like THIS.

It's possible that that makes the difference, particularly since the focal length is longer than the actual lenses. I dunno for sure. This isn't my area of expertise.

Tetragramm gravatar imageTetragramm ( 2019-02-09 09:20:53 -0500 )edit