# Back-projecting a 2d point to a ray

What I need is very simple! Given a 2D point in image coordinates and a calibrated camera, I need the 3D projective ray corresponding to this image point. I know the procedure, mathematically at least, but I'm sure I'm missing something very stupid.

My camera is calibrated using calibrateCamera(). As a result, I have the intrinsics K, rvec, tvec and the distortion coefficients. I'm stuck at creating the camera projection matrix P (which is 3x4) so that I can back-project my 2d points, using the pseudo-inverse of P (as mentioned in Multiple View Geometry by Hartley and Zisserman).

How can I correctly create P using the values obtained from calibrateCamera() call? Is there an internal OpenCV function that I can use for creating the camera projection matrix, or better yet, back-projection?

What I have tried:

I tried manually creating the matrix P using the formulation P = K[R | T]. Where R was obtained using cv::Rodrigues(rvec) and T was set to tvec. However, that P matrix was wrong because it produced wrong results even for forward projection of 3D points to 2D coordinates :(

I guess the other confusing thing for me is that I don't know what I'm supposed to do with rvec and tvec. I know they are the camera pose information which convert world coordinates to camera coordinate frame. However, the 3D points that I have are already in camera coordinate frame because they were obtained using a Kinect! So I don't think any additional transformation is needed.

Any help is greatly appreciated.

edit retag close merge delete

1

From a 2D image point, you want to get the ray line equation or the 3D coordinate (the first one is possible , the second one is not possible if you are using a monocular camera without doing a structure from motion technique and without an assumption as the ray intersects a plane of known equation)?

Also, if you are using the Kinect, there should be already a function (e.g. OpenNI or Kinect SDK) that do that.

( 2016-12-07 03:17:19 -0500 )edit

@Eduardo Thanks for your response. Sorry if it wasn't clear, I need the ray equation (so image point to ray). Although it is true that Kinect SDK or OpenNI provide such functionality, it isn't an option for me because I am using offline Kinect data. So the data was captured previously, and now I am using it...

( 2016-12-07 08:14:20 -0500 )edit

If you need the ray line equation, you should be able to get the 2D coordinate in the normalized camera frame (Z=1) using the intrinsic and then, I think that you should get a line equation from the origin to the 2D coordinate in the normalized camera frame.

( 2016-12-07 09:01:20 -0500 )edit

@Eduardo Exactly! I know the theory! But all I need is to properly convert what calibrateCamera() gives me to camera projection matrix P (the 3x4 matrix). And that's what I'm asking

( 2016-12-07 09:04:19 -0500 )edit

To compute the normalized camera coordinate (without considering distorsion coefficients):

• x = (u - c_x) / f_x
• y = (v - c_y) / f_y

I am not sure to understand completely the final goal. If it is to get the 3D coordinate in the camera frame from the 2D image coordinate, as you are using the Kinect, you should have the depth map with the Kinect data, that is, an image that contains at each position the depth.

If the depth image and the color image are aligned, you will be able to get the corresponding 3D for a particular 2D image coordinate in the color image.

Also, when you calibrate a camera, the intrinsic parameters are fixed for the camera but the extrinsic parameters depends on the scene, thus you cannot use them in another image / scene.

( 2016-12-07 11:38:23 -0500 )edit

Thanks! The goal is to do OpenGL style "picking". In other words, what ray corresponds to the 2d pixel under the mouse cursor? As you just mentioned, I already have the 3D coordinates of all my points, so, my goal is not the 3d points themselves. While the formula you wrote does back projecrion my original question is still unresolved! What is the 3x4 camera projection matrix that corresponds to my intrinsics and extrinsics... You also said that the extrinsics change per scene... So what if I make sure world coordinate frame is the same as camera coordinate frame, then in that case wouldn't the extrinsics become rotation = identity and translation = 0?

( 2016-12-07 13:08:01 -0500 )edit
1

The projection matrix should be what you wrote: P = K [R | T]. In the case where the 3D points are already in the camera frame, yes R = identity 3x3 and T = [0 0 0]^T.

( 2016-12-08 03:12:47 -0500 )edit

Sort by ยป oldest newest most voted

I do this as part of the mapping3d module I"m making.

The relevant code is HERE.

Note that the LOS variable initially starts as the LOS from position (0,0,0) with no rotations. To get the LOS in another frame you have to apply the camera->frame rotation, which you can also see an example of in lines 217-221.

more

@Tetragramm Sorry for the late reply and thanks for your answer... I'm not entirely sure what's happening in the snippet you linked to. Are you computing FoV based on the focal length values? I've managed to get some sort of back-projection working by converting my intrinsics matrix to OpenGL-style projection matrix, and find the 3D ray using the near and far planes of the view frustum, but my method obviously lacks accuracy :( I really like to know what's happening in your code.

( 2016-12-11 21:58:27 -0500 )edit
1

Basically. See HERE, which explains how that works.

The focal length of the camera is in pixel units, so instead of getting the full FOV, I get the FOV if the FPA was only as big as the point I'm interested in. Hence it's the angle to the point. Convert that angle to the elevation, and the direction on the FPA from the center to the azimuth, and you have everything you need.

( 2016-12-11 23:07:57 -0500 )edit

@Tetragramm Thanks! It's starting to make sense : ) So I know that if I try to backproject using camera P matrix directly, it would give me wrong results for the reason you just mentioned (camera parameters are in pixels, world coordinates I have are in meters). I want to be sure so I'm asking: is LOS a ray in world system or is it still in pixels?? Thanks again

( 2016-12-13 08:35:29 -0500 )edit
1

It's a unit ray in the world system. It has no length, just direction. I'm not sure about using the projection matrix directly, but it decomposes into all the information you need.

The only assumed unit in the code is that the camera matrix has focal length in pixels. If that's not true, the FOV calculation will be wrong, and everything past that.

( 2016-12-13 17:58:59 -0500 )edit

@Tetragramm I spend the past couple of days working on this again and using what you've provided in your code. For reasons I don't understand, these calculations are giving me wrong results... Yes my calibration matrix is in pixel units. For a known 3D point and it's 2D image, when I backproject using your calculations, and intersect the resulting ray with a known 3D plane, I get a 3D point which is about 0.25 meters away from the actual 3D point. However, using the stupid OpenGL projection matrix route that I mentioned earlier, I get a 3D point which is 0.02 meters away... I can't figure out why and this is frustrating : ( I can include the OpenGL method, if you're interested.

( 2016-12-16 10:06:47 -0500 )edit

@Tetragramm Literally 10 minutes after posting this comment I found something out! I have to negate the X and Y components of the direction vector that I get from you computations for some reason! Negating those values gave me the right ray with a slightly better accuracy than my "OpenGL" approach. Trying to figure out why now... : )

( 2016-12-16 10:22:51 -0500 )edit

Ah, you're using an OpenGL rotation matrix? To convert that to the coordinate system used in OpenCV (ie: from the solvePNP method) multiply it by

 1 0 0
0-1 0
0 0-1


You may also need to transpose the results of your multiplication.. I'm not sure if that's all OpenGL rotations, or just the one I'm currently using.

( 2016-12-16 15:03:12 -0500 )edit

@Tetragramm I'm rendering using OpenGL, but my computations are not dependent on it. So I don't think the cooridnate system handed-ness is an issue... Let me double-check though... For now negating X and Y components of the ray work well : ) Thanks a lot for your answer btw. This thing works excellent : )

( 2016-12-16 15:07:28 -0500 )edit

It's not handedness, it's just a different coordinate system. OpenCV assumes +x is to the right of the image, +y is bottom of the image, and +z is forward, out of the camera. OpenGL has +y is top of the image and +z is into the camera.

( 2016-12-16 15:17:42 -0500 )edit

I know it's been a month and my original problem was solved back then. However, today I had some time to go through the issue again and figure out why I needed the coordinate system transformation that Tetragramm mentioned in the comments.

Turns out my problem was caused by a subtle bug that was the result of using Microsoft's Kinect SDK for creating point clouds. If you use Microsoft's SDK, the depth and the color images that are given to you are mirrored (flipped in the X direction). However, the functions that map image points to world points gives out the results in the correct orientation!! Not only that, but also the coordinate system that gives rise to the world points is different than the camera coordinate system.

As Tetragramm mentioned in the comments, the camera coordinate system has +X pointing to the right, +Y pointing downwards and +Z pointing forward. However, the point cloud obtained by Kinect's SDK has +X pointing to the left and +Y pointing upwards (+Z is the same).

These two facts combined together are in my opinion, very confusing and have caused me a lot of headache over time. What made these even worse was that I had calibrated my Kinect's camera using the raw images that were given to me by Microsoft's SDK. So they were flipped in the X direction, resulting calibration matrices that were just wrong for working with the point cloud data!!!!

In summary, Hartley and Zisserman's approach works fine. If you define the origin of your world at your camera, and have the calibration matrix K of your camera, to backproject the 2D image point (x, y), simply do the following multiplication:

K^(-1) * (x, y, 1)^T


where K^(-1) is the inverse of your calibration matrix and (x, y, 1)^T is the transpose of the point. Doing so gave me results similar to Tetragramm's answer.

more

2

I should point out that the Hartley and Zisserman approach is significantly more accurate than what I was using, and is what I am using now.

I'm glad you got it working. Bad coordinate systems are simply hard to work with. Be glad it was still right-handed. You don't know bad until you work on a system with approximately half left-handed coordinate systems and half right-handed.

( 2017-01-18 18:07:17 -0500 )edit

@Tetragramm Haha, true! Believe me, I do! Some LiDAR data have the coordinate system all messed up! I've been bitten by this stupid thing so many times, yet I don't know why I haven't learned from my mistakes... : (

( 2017-01-18 18:10:15 -0500 )edit

Official site

GitHub

Wiki

Documentation