# How to tranform 2D image coordinates to 3D world coordinated with Z = 0?

Hi everyone, I currently working on my project which involves vehicle detection and tracking and estimating and optimizing a cuboid around the vehicle. For that I am taking the center of the detected vehicle and I need to find the 3D world coodinate of the point and then estimate the world coordinates of the edges of the cuboid and the project it back to the image to display it.

So, now I am new to computer vision and OpenCV, but in my knowledge, I just need 4 points on the image and need to know the world coordinates of those 4 points and use solvePNP in OpenCV to get the rotation and translation vectors (I already have the camera matrix and distortion coefficients). Then, I need to use Rodrigues to transform the rotation vector into a rotation matrix and then concatenate it with the translation vector to get my extrinsic matrix and then multiply the extrinsic matrix with the camera matrix to get my projection matrix. Since my z coordinate is zero, so I need to take off the third column from the projection matrix which gives the homography matrix for converting the 2D image points to 3D world points. Now, I find the inverse of the homography matrix which gives me the homography between the 3D world points to 2D image points. After that I multiply the image points [x, y, 1]t with the inverse homography matrix to get [wX, wY, w]t and the divide the entire vector by the scalar w to get [X, Y, 1] which gives me the X and Y values of the world coordinates.

My code is like this:

image_points.push_back(Point2d(275, 204));
image_points.push_back(Point2d(331, 204));
image_points.push_back(Point2d(331, 308));
image_points.push_back(Point2d(275, 308));

cout << "Image Points: " << image_points << endl << endl;

world_points.push_back(Point3d(0.0, 0.0, 0.0));
world_points.push_back(Point3d(1.775, 0.0, 0.0));
world_points.push_back(Point3d(1.775, 4.620, 0.0));
world_points.push_back(Point3d(0.0, 4.620, 0.0));

cout << "World Points: " << world_points << endl << endl;

solvePnP(world_points, image_points, cameraMatrix, distCoeffs, rotationVector, translationVector);
cout << "Rotation Vector: " << endl << rotationVector << endl << endl;
cout << "Translation Vector: " << endl << translationVector << endl << endl;

Rodrigues(rotationVector, rotationMatrix);
cout << "Rotation Matrix: " << endl << rotationMatrix << endl << endl;

hconcat(rotationMatrix, translationVector, extrinsicMatrix);
cout << "Extrinsic Matrix: " << endl << extrinsicMatrix << endl << endl;

projectionMatrix = cameraMatrix * extrinsicMatrix;
cout << "Projection Matrix: " << endl << projectionMatrix << endl << endl;

double p11 = projectionMatrix.at<double>(0, 0),
p12 = projectionMatrix.at<double>(0, 1),
p14 = projectionMatrix.at<double>(0, 3),
p21 = projectionMatrix.at<double>(1, 0),
p22 = projectionMatrix.at<double>(1, 1),
p24 = projectionMatrix.at<double>(1, 3),
p31 = projectionMatrix.at<double>(2, 0),
p32 = projectionMatrix.at<double>(2, 1),
p34 = projectionMatrix.at<double>(2, 3);

homographyMatrix = (Mat_<double>(3, 3) << p11, p12, p14, p21, p22, p24, p31, p32, p34);
cout << "Homography Matrix: " << endl << homographyMatrix << endl << endl;

inverseHomographyMatrix = homographyMatrix.inv();
cout << "Inverse Homography Matrix: " << endl << inverseHomographyMatrix << endl << endl;

Mat point2D = (Mat_<double>(3, 1) << image_points[0].x, image_points[0].y, 1);
cout << "First Image ...
edit retag close merge delete

I don't think you can use solvePnP. You have to use linear least square method as it is explained this method

( 2017-05-21 02:27:23 -0500 )edit

How did you solve this problem?

( 2019-08-01 12:17:07 -0500 )edit

Sort by » oldest newest most voted

solvePnP should be fine, though you may want to set the flags to use the method that specifically works with 4 point sets. Or use more points.

I think you're doing the projection wrong though. Take a look HERE. This method shows how to get the line of sight from a point in and image and the camera intrinsics and extrinsics. Except the last section, which you'll need to write, where you scale the LOS to be the negative of the cameraTranslation z value. Then add that to the translation and get your result.

more

Official site

GitHub

Wiki

Documentation