Ask Your Question
1

3D to 2D Points using cv::projectPoints

asked 2020-11-11 05:27:07 -0600

Nextar gravatar image

I have an issue using cv::projectPoints in opencv 4.5 projecting 3D Lidar Points into a 2D image.

  • There is no roll/pitch/yaw so the rvec is 0.
  • Points are already in world space and only have to be transformed to camera space with tvec.
  • There is no camera lens distortion

I tested and ran the code below for an image resolution of 785x785 which works fine. Projected Points are on the correct position in the image.

After I've changed the resolution to 1600x1200 the code below does not work correctly anymore. Projected 2D Points are approx 30px off (+ ~30px in direction on top).

I don't really understand whats the issue. Has anyone an idea? I furthermore checked by setting the resolution to 1200x1200 which works correctly again. So the issue comes from not having the same width and height.

My guess is that there might be an issue with cmat.

    cv::Mat rvec, tvec, cmat;
    rvec.create(1, 3, cv::DataType<float>::type);
    rvec.at<float>(0) = 0;
    rvec.at<float>(1) = 0;
    rvec.at<float>(2) = 0;

    tvec.create(3, 1, cv::DataType<float>::type);
    tvec.at<float>(0) = camera.opencv_origin.x;
    tvec.at<float>(1) = -camera.opencv_origin.y; // In coordinate system y and z axis are inverted
    tvec.at<float>(2) = -camera.opencv_origin.z; // In coordinate system y and z axis are inverted

    cmat.create(3, 3, cv::DataType<float>::type);
    cmat.at<float>(0, 0) = camera.image_width / 2;
    cmat.at<float>(1, 1) = camera.image_height / 2;
    cmat.at<float>(0, 2) = camera.image_width / 2;
    cmat.at<float>(1, 2) = camera.image_height / 2;
    cmat.at<float>(2, 2) = 1;

    std::vector<cv::Point2f> points_image;
    cv::projectPoints(points_world, rvec, tvec, cmat, cv::noArray(), points_image);
    for (const auto& p : points_image) {
        cv::circle(image, p, 2, cv::Scalar(0, 0, 255), -1);
    }
edit retag flag offensive close merge delete

Comments

Do you want to try your hand at doing the math manually? If so, let me know, and I can upload some code.

sjhalayka gravatar imagesjhalayka ( 2020-11-11 14:06:23 -0600 )edit

@sjhalayka help in any form is welcome :)

Nextar gravatar imageNextar ( 2020-11-12 02:56:41 -0600 )edit

P.S. Please don't use the auto keyword LOL.

sjhalayka gravatar imagesjhalayka ( 2020-11-12 15:28:48 -0600 )edit

2 answers

Sort by ยป oldest newest most voted
2

answered 2020-11-12 07:18:09 -0600

crackwitz gravatar image

consider a 3x3 camera matrix. it contains fx, fy, cx, cy, some zeros, and a 1.

fx and fy are usually equal or close, if you want square pixels. they are independent of image aspect ratio (more on that below). you scale them both equally if you have more resolution.

you can calculate fx given a field of view and desired resolution (in each axis, horizontally/vertically).

consider a point right on the edge of the FoV. it's at an angle of fov/2. you want that to go on the edge of the image, so it goes to cx + width/2 (same for y/height, the cx component comes from the third column of the matrix).

so now we have tan(fov/2) * fx = width/2. rearrange, fx = (width/2) / tan(fov/2)

then, fx = fy if you want square pixels. if you have diagonal field of view, the calculation uses hypot(width/2,height/2) instead of just width/2.

now, for the same fx,fy you can have a larger resolution picture (recalculate cx,cy to keep things centered). that merely affects the field of view.

if you want twice the resolution (or any factor), you'd multiply fx,fy,cx,cy,width,height by that factor and everything is okay.

edit flag offensive delete link more

Comments

@crackwitz I tested it out:

width = 1600
height = 1200
fov = 90

cx= 800
cy=600
fx = width/2 * tan(90/2)
fx = 800 * 1
fx = 800
fy = hypot(1600/2,1200/2) * tan(90/2)
fy = 1000 * 1
fy = 1000

Which leads to incorrect results (points aren't on the image anymore) (I know also from testing out that fx is in fact 800, and fy should be somewhere around 600)

Is there an issue I've done ? Also: If you have literature could you add them for me to read up the calculation?

Nextar gravatar imageNextar ( 2020-11-16 03:01:37 -0600 )edit

also a sign that there has to be an error in the calculation (Maybe I did understand something wrong):

fy shouldn't be bigger than fx. (as width is bigger than height)

Nextar gravatar imageNextar ( 2020-11-16 03:09:15 -0600 )edit

the calculation involves a division, not multiplication. also, tan() takes angles in radians, while you pass it a value in degrees. radians = degrees/180*pi

crackwitz gravatar imagecrackwitz ( 2020-11-16 06:52:31 -0600 )edit

as for literature, "Multiple View Geometry in Computer Vision" by Hartley & Zisserman, is a hefty tome and I've never looked into it, but the professor of the CV class I took referenced it

crackwitz gravatar imagecrackwitz ( 2020-11-16 06:55:46 -0600 )edit

@crackwitz Sorry, I did mean to write a division (don't know why I've written multiplication). But the tan should be correct (I didn't completly write it out, but tan(45) is 1)

Nextar gravatar imageNextar ( 2020-11-16 08:54:36 -0600 )edit

tan(pi/4) is 1. trigonometric functions work with radians. you can't just use numbers in degrees. as for "incorrect results", please provide data to replicate your findings.

crackwitz gravatar imagecrackwitz ( 2020-11-16 10:07:09 -0600 )edit

oh and you misunderstood how to handle fy. fy does NOT use the hypotenuse. you use the hypotenuse when you have a DIAGONAL FoV. and again, fx should be equal to fy regardless of aspect ratio, because you have square pixels.

crackwitz gravatar imagecrackwitz ( 2020-11-16 10:09:12 -0600 )edit

@crackwitz Thanks again, I'm so stupid, yea I really misunderstood. I even saw an image on another website (Regarding diagonal FOV). Basically I'll end up with:

width = 1600
height = 1200
fov = 90

cx= 800
cy=600
fx = width/2 / tan(90/2 deg) // too lazy to write deg to rad conversion here
fx = 800 / 1
fx = 800
fy = height/2 / tan(90/2 deg)
fy = 600 / 1
fy = 600

Which is unfortunately the same result as in the question initially (a few pixels off to the top) fy should be around 630 to be correct

Nextar gravatar imageNextar ( 2020-11-16 12:49:20 -0600 )edit

for an image that isn't square, the horizontal and vertical FoV differ. for fy=630 (and 1600x1200 resolution), I get a vertical FoV of 87.2 degrees (= arctan(h/2 / fy) * 180/pi * 2), and assuming square pixels that would be a horizontal FoV of 103.6 degrees. if you have differences of just a few pixels, that can be due to lens distortion (barrel/pincushion). I would suggest that you post pictures or screenshots to illustrate the problem.

crackwitz gravatar imagecrackwitz ( 2020-11-16 14:04:48 -0600 )edit
1

answered 2020-11-12 12:24:54 -0600

sjhalayka gravatar image

updated 2020-11-12 19:37:35 -0600

Here is code to do it from "scratch". GLM is a header-only library available from https://glm.g-truc.net

Just be mentally prepared for the code to return screen coordinates that are outside of the view port, when the world position is not within the corresponding frustum.

#include <glm/vec2.hpp>
#include <glm/vec3.hpp>
#include <glm/vec4.hpp>
#include <glm/mat4x4.hpp>
#include <glm/gtc/matrix_transform.hpp>
using namespace glm;

#include <iostream>
using namespace std;



vec2 world_to_screen(const vec3 world, const mat4& MVP, const int viewport_width, const int viewport_height)
{
    const vec4 world4(world.x, world.y, world.z, 1);
    vec4 screen4(MVP * world4);

    // Take care of possible division by zero
    if (screen4.w == 0)
        screen4.w = 1;

    vec2 screen2;
    screen2.x = round(screen4.x / screen4.w * viewport_width / 2 + viewport_width / 2);
    screen2.y = round(screen4.y / screen4.w * viewport_height / 2 + viewport_height / 2);

    return screen2;
}



int main(void)
{
    const float y_field_of_view_degrees = 45.0f;
    const float near_plane = 0.01f;
    const float far_plane = 1000.0f;
    const vec3 camera_pos(0, 0, -1);
    const vec3 look_at_pos(0, 0, 0);

    const vec3 camera_vector = normalize(camera_pos - look_at_pos);

    // Be flexible enough to take into account
    // an arbitrary camera_pos and look_at_pos
    vec3 side_vector;

    if (camera_vector == vec3(0, 1, 0))
        side_vector = normalize(cross(vec3(0, 0, 1), camera_vector));
    else
        side_vector = normalize(cross(vec3(0, 1, 0), camera_vector));

    const vec3 up_vector = cross(camera_vector, side_vector);

    const vec3 world_pos(3, 1, -10);
    const int viewport_width = 800;
    const int viewport_height = 600;
    const float aspect_ratio = static_cast<float>(viewport_width) / static_cast<float>(viewport_height);

    mat4 projection_mat = perspective(
        radians(y_field_of_view_degrees),
        aspect_ratio,
        near_plane,
        far_plane
    );

    mat4 view_mat = lookAt(
        camera_pos,
        look_at_pos,
        up_vector
    );

    mat4 model_mat(1.0f); // Identity matrix

    mat4 mvp_mat = projection_mat * view_mat * model_mat;
    vec2 screen_pos = world_to_screen(world_pos, mvp_mat, viewport_width, viewport_height);

    cout << screen_pos.x << ' ' << screen_pos.y << endl;

    return 0;
}

If you're interested, here is a custom projection matrix code that I wrote before I started to use GLM:

void get_perspective_matrix(float fovy, float aspect, float znear, float zfar, float (&mat)[16])
{
    const float pi = 4.0f*atanf(1.0);

    // Convert fovy to radians, then divide by 2
    float       f = 1.0f / tan(fovy/360.0f*pi);

    mat[0] = f/aspect; mat[4] = 0; mat[8] = 0;                              mat[12] = 0;
    mat[1] = 0;        mat[5] = f; mat[9] = 0;                              mat[13] = 0;
    mat[2] = 0;        mat[6] = 0; mat[10] = (zfar + znear)/(znear - zfar); mat[14] = (2.0f*zfar*znear)/(znear - zfar);
    mat[3] = 0;        mat[7] = 0; mat[11] = -1;                            mat[15] = 0;
}

You can see that f plays a major role in the projection matrix. I believe that this is analogous to focal length. Regardless, GLM is the way to go! Basically, I implemented my own matrix library, but GLM was so much better.

As for screen coordinates lying outside of the view port, there are 8 possible regions where it could lie. Check out this screen shot from a game I'm making, illustrating what I mean (look for the yellow arrow along the bottom edge of the view port, which indicates the presence of an enemy game ... (more)

edit flag offensive delete link more

Comments

1

Thanks for the good read :) I really enjoyed it. For now I'll try to stick with opencv. But if if unable to get it running I may consider switching to opengl and use your approach

Nextar gravatar imageNextar ( 2020-11-16 02:36:55 -0600 )edit
1

Yes, it sure makes you appreciate how complicated OpenCV's guts must be. :)

sjhalayka gravatar imagesjhalayka ( 2020-11-16 12:25:39 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2020-11-11 05:27:07 -0600

Seen: 3,820 times

Last updated: Nov 12 '20