# Scale and orientation of cameraspace

## Introduction

I have obtained my camera intrinsics using the standard chessboard method, with source images at 640x512. This returns the following values:

- cx, cy = 319.5, 255.5 (very close to image centre, as expected)
- fx = fy = 1731.4

I have done the classic camera calibration and pose estimation as follows:

- Chessboard lying flat on table
- Camera positioned around 800mm above the chessboard, looking straight down, such that board edges are closely parallel to image edges
- Use findChessboardCorners() to get imagespace inner corners
- Create modelspace corners directly, with (0,0,0) being the top-left inner corner, and side length being 21.5mm (as measured in the real world)
- Use solvePnP() to get transform (rotation, translation) from the board modelspace to cameraspace

The final step gives me a rotation of very close to identity - this is as expected, given that my camera is looking directly down at the chessboard. It also gives me a translation where x and y are small (< 100), and z=807. The size of z is as expected - it closely matches the measured distance between the camera and board, in mm.

## Question 1

However, I am confused by the direction of the camera local z axis. My understanding is that:

- In modelspace, z points directly up from the board In the pinhole camera model
- In cameraspace z points into the scene, i.e. from the origin towards the image plane

But given that solvePnP() returns a rotation of close to identity, it's effectively giving me a cameraspace orientation where the NEGATIVE z axis points towards the board, rather than the POSITIVE z axis.

## Question 2

I am confused about the scale factor required to convert back from imagespace to board modelspace.

- Let R and T be the rotation and translation returned by solvePnP()
- Let RI be the inverse (transpose) of R
- Let MI be the inverse of the camera intrinsics matrix
- Let p be my imagespace point

Then my viewspace point v = MI * p and my modelspace point b = RI * (s * v - T)

I have confirmed by inspection that this calculation of b is correct, IF the scaling factor s is set to be the distance of the camera origin from the board (around 807).

Why is this the case? How should I be deriving this scaling factor?

## Question 3

Probably related to Q2... I am unsure about the scaling factor from imagespace to modelspace (worldspace). I am wanting to construct worldspace rays, running from the camera origin through imagespace points (pixels). I understand that in cameraspace I can place points at (px - cx, py - cy, pz) where px, py are my pixel coordinates and pz is the camera focal length in pixels (which is element 0,0 of the intrinsics matrix). But I need to scale these points to worldspace - how?

Thanks!

For Q3, I've realised that for constructing the rays, scale doesn't matter... I can scale all of px, py, pz by the same factor without affecting the ray direction. However, I'm still interested in how I can derive the scaling from imagespace to worldspace!