Back-projecting a 2d point to a ray
What I need is very simple! Given a 2D point in image coordinates and a calibrated camera, I need the 3D projective ray corresponding to this image point. I know the procedure, mathematically at least, but I'm sure I'm missing something very stupid.
My camera is calibrated using calibrateCamera()
. As a result, I have the intrinsics K
, rvec
, tvec
and the distortion coefficients. I'm stuck at creating the camera projection matrix P
(which is 3x4) so that I can back-project my 2d points, using the pseudo-inverse of P
(as mentioned in Multiple View Geometry by Hartley and Zisserman).
How can I correctly create P
using the values obtained from calibrateCamera()
call? Is there an internal OpenCV function that I can use for creating the camera projection matrix, or better yet, back-projection?
What I have tried:
I tried manually creating the matrix P
using the formulation P = K[R | T]
. Where R
was obtained using cv::Rodrigues(rvec)
and T
was set to tvec
. However, that P
matrix was wrong because it produced wrong results even for forward projection of 3D points to 2D coordinates :(
I guess the other confusing thing for me is that I don't know what I'm supposed to do with rvec
and tvec
. I know they are the camera pose information which convert world coordinates to camera coordinate frame. However, the 3D points that I have are already in camera coordinate frame because they were obtained using a Kinect! So I don't think any additional transformation is needed.
Any help is greatly appreciated.
From a 2D image point, you want to get the ray line equation or the 3D coordinate (the first one is possible , the second one is not possible if you are using a monocular camera without doing a structure from motion technique and without an assumption as the ray intersects a plane of known equation)?
Also, if you are using the Kinect, there should be already a function (e.g. OpenNI or Kinect SDK) that do that.
@Eduardo Thanks for your response. Sorry if it wasn't clear, I need the ray equation (so image point to ray). Although it is true that Kinect SDK or OpenNI provide such functionality, it isn't an option for me because I am using offline Kinect data. So the data was captured previously, and now I am using it...
If you need the ray line equation, you should be able to get the 2D coordinate in the normalized camera frame (Z=1) using the intrinsic and then, I think that you should get a line equation from the origin to the 2D coordinate in the normalized camera frame.
@Eduardo Exactly! I know the theory! But all I need is to properly convert what calibrateCamera() gives me to camera projection matrix P (the 3x4 matrix). And that's what I'm asking
To compute the normalized camera coordinate (without considering distorsion coefficients):
x = (u - c_x) / f_x
y = (v - c_y) / f_y
I am not sure to understand completely the final goal. If it is to get the 3D coordinate in the camera frame from the 2D image coordinate, as you are using the Kinect, you should have the depth map with the Kinect data, that is, an image that contains at each position the depth.
If the depth image and the color image are aligned, you will be able to get the corresponding 3D for a particular 2D image coordinate in the color image.
Also, when you calibrate a camera, the intrinsic parameters are fixed for the camera but the extrinsic parameters depends on the scene, thus you cannot use them in another image / scene.
Thanks! The goal is to do OpenGL style "picking". In other words, what ray corresponds to the 2d pixel under the mouse cursor? As you just mentioned, I already have the 3D coordinates of all my points, so, my goal is not the 3d points themselves. While the formula you wrote does back projecrion my original question is still unresolved! What is the 3x4 camera projection matrix that corresponds to my intrinsics and extrinsics... You also said that the extrinsics change per scene... So what if I make sure world coordinate frame is the same as camera coordinate frame, then in that case wouldn't the extrinsics become rotation = identity and translation = 0?
The projection matrix should be what you wrote:
P = K [R | T]
. In the case where the 3D points are already in the camera frame, yes R = identity 3x3 and T = [0 0 0]^T.