1 | initial version |
I know it's been a month and my original problem was solved back then. However, today I had some time to go through the issue again and figure out why I needed the coordinate system transformation that Tetragramm mentioned in the comments.
Turns out my problem was caused by a subtle bug that was the result of using Microsoft's Kinect SDK for creating point clouds. If you use Microsoft's SDK, the depth and the color images that are given to you are mirrored (flipped in the X direction). However, the functions that map image points to world points gives out the results in the correct orientation!! Not only that, but also the coordinate system that gives rise to the world points is different than the camera coordinate system.
As Tetragramm mentioned in the comments, the camera coordinate system has +X pointing to the right, +Y pointing downwards and +Z pointing forward. However, the point cloud obtained by Kinect's SDK has +X pointing to the left and +Y pointing upwards (+Z is the same).
These two facts combined together are in my opinion, very confusing and have caused me a lot of headache over time. What made these even worse was that I had calibrated my Kinect's camera using the raw images that were given to me by Microsoft's SDK. So they were flipped in the X direction, resulting calibration matrices that were just wrong for working with the point cloud data!!!!
In summary, Hartley and Zisserman's approach works fine. If you define the origin of your world at your camera, and have the calibration matrix K of your camera, to backproject the 2D image point (x, y), simply do the following multiplication:
K^(-1) * (x, y, 1)^T
where K^(-1)
is the inverse of your calibration matrix and (x, y, 1)^T
is the transpose of the point. Doing so gave me results similar to Tetragramm's answer.