Introduction
I have obtained my camera intrinsics using the standard chessboard method, with source images at 640x512. This returns the following values:
- cx, cy = 319.5, 255.5 (very close to image centre, as expected)
- fx = fy = 1731.4
I have done the classic camera calibration and pose estimation as follows:
- Chessboard lying flat on table
- Camera positioned around 800mm above the chessboard, looking straight down, such that board edges are closely parallel to image edges
- Use findChessboardCorners() to get imagespace inner corners
- Create modelspace corners directly, with (0,0,0) being the top-left inner corner, and side length being 21.5mm (as measured in the real world)
- Use solvePnP() to get transform (rotation, translation) from the board modelspace to cameraspace
The final step gives me a rotation of very close to identity - this is as expected, given that my camera is looking directly down at the chessboard. It also gives me a translation where x and y are small (< 100), and z=807. The size of z is as expected - it closely matches the measured distance between the camera and board, in mm.
Question 1
However, I am confused by the direction of the camera local z axis. My understanding is that:
- In modelspace, z points directly up from the board In the pinhole camera model
- In cameraspace z points into the scene, i.e. from the origin towards the image plane
But given that solvePnP() returns a rotation of close to identity, it's effectively giving me a cameraspace orientation where the NEGATIVE z axis points towards the board, rather than the POSITIVE z axis.
Question 2
I am confused about the scale factor required to convert back from imagespace to board modelspace.
- Let R and T be the rotation and translation returned by solvePnP()
- Let RI be the inverse (transpose) of R
- Let MI be the inverse of the camera intrinsics matrix
- Let p be my imagespace point
Then my viewspace point v = MI * p and my modelspace point b = RI * (s * v - T)
I have confirmed by inspection that this calculation of b is correct, IF the scaling factor s is set to be the distance of the camera origin from the board (around 807).
Why is this the case? How should I be deriving this scaling factor?
Question 3
Probably related to Q2... I am unsure about the scaling factor from imagespace to modelspace (worldspace). I am wanting to construct worldspace rays, running from the camera origin through imagespace points (pixels). I understand that in cameraspace I can place points at (px - cx, py - cy, pz) where px, py are my pixel coordinates and pz is the camera focal length in pixels (which is element 0,0 of the intrinsics matrix). But I need to scale these points to worldspace - how?
Thanks!