Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Pose estimation using PNP: Strange wrong results

Hello, I am trying to use the PNP algorithm implementations in Open CV (EPNP, Iterative etc.) to get the metric pose estimates of cameras in a two camera pair (not a conventional stereo rig, the cameras are free to move independent of each other). My source of images currently is a robot simulator (Gazebo), where two cameras are simulated in a scene of objects. The images are almost ideal: i.e., zero distortion, no artifacts.

So to start off, this is my first pair of images.

enter image description here enter image description here

I assume the right camera as "origin". In metric world coordinates, left camera is at (1,1,1) and right is at (-1,1,1) (2m baseline along X). Using feature matching, I construct the essential matrix and thereby the R and t of the left camera w.r.t. right. This is what I get.

R in euler angles: [-0.00462468, -0.0277675, 0.0017928]
t matrix: [-0.999999598978524; -0.0002907901840156801; -0.0008470441900959029]

Which is right, because the displacement is only along the X axis in the camera frame. For the second pair, the left camera is now at (1,1,2) (moved upwards by 1m).

enter image description here enter image description here

Now the R and t of left w.r.t. right become:

R in euler angles: [0.0311084, -0.00627169, 0.00125991]
t matrix: [-0.894611301085138; -0.4468450866008623; -0.0002975759140359637]

Which again makes sense: there is no rotation; the displacement along Y axis is half of what the baseline (along X) is, so on, although this t doesn't give me the real metric estimates.

So in order to get metric estimates of pose in case 2, I constructed the 3D points using points from camera 1 and camera 2 in case 1 (taking the known baseline into account: which is 2m), and then ran the PNP algorithm with those 3D points and the image points from case 2. Strangely, both ITERATIVE and EPNP algorithms give me a result that looks like this:

Pose according to final PNP calculation is: 
Rotation euler angles: [-9.68578, 15.922, -2.9001]
Metric translation in m: [-1.944911461358863; 0.11026997013253; 0.6083336931263812]

Am I missing something basic here? I thought this should be a relatively straightforward calculation for PNP given that there's no distortion etc. ANy comments or suggestions would be very helpful, thanks!

Pose estimation using PNP: Strange wrong results

Hello, I am trying to use the PNP algorithm implementations in Open CV (EPNP, Iterative etc.) to get the metric pose estimates of cameras in a two camera pair (not a conventional stereo rig, the cameras are free to move independent of each other). My source of images currently is a robot simulator (Gazebo), where two cameras are simulated in a scene of objects. The images are almost ideal: i.e., zero distortion, no artifacts.

So to start off, this is my first pair of images.

enter image description here enter image description here

I assume the right camera as "origin". In metric world coordinates, left camera is at (1,1,1) and right is at (-1,1,1) (2m baseline along X). Using feature matching, I construct the essential matrix and thereby the R and t of the left camera w.r.t. right. This is what I get.

R in euler angles: [-0.00462468, -0.0277675, 0.0017928]
t matrix: [-0.999999598978524; -0.0002907901840156801; -0.0008470441900959029]

Which is right, because the displacement is only along the X axis in the camera frame. For the second pair, the left camera is now at (1,1,2) (moved upwards by 1m).

enter image description here enter image description here

Now the R and t of left w.r.t. right become:

R in euler angles: [0.0311084, -0.00627169, 0.00125991]
t matrix: [-0.894611301085138; -0.4468450866008623; -0.0002975759140359637]

Which again makes sense: there is no rotation; the displacement along Y axis is half of what the baseline (along X) is, so on, although this t doesn't give me the real metric estimates.

So in order to get metric estimates of pose in case 2, I constructed the 3D points using points from camera 1 and camera 2 in case 1 (taking the known baseline into account: which is 2m), and then ran the PNP algorithm with those 3D points and the image points from case 2. Strangely, both ITERATIVE and EPNP algorithms give me a similar and completely wrong result that looks like this:

Pose according to final PNP calculation is: 
Rotation euler angles: [-9.68578, 15.922, -2.9001]
Metric translation in m: [-1.944911461358863; 0.11026997013253; 0.6083336931263812]

Am I missing something basic here? I thought this should be a relatively straightforward calculation for PNP given that there's no distortion etc. ANy comments or suggestions would be very helpful, thanks!

Pose estimation using PNP: Strange wrong results

Hello, I am trying to use the PNP algorithm implementations in Open CV (EPNP, Iterative etc.) to get the metric pose estimates of cameras in a two camera pair (not a conventional stereo rig, the cameras are free to move independent of each other). My source of images currently is a robot simulator (Gazebo), where two cameras are simulated in a scene of objects. The images are almost ideal: i.e., zero distortion, no artifacts.

So to start off, this is my first pair of images.

enter image description here enter image description here

I assume the right camera as "origin". In metric world coordinates, left camera is at (1,1,1) and right is at (-1,1,1) (2m baseline along X). Using feature matching, I construct the essential matrix and thereby the R and t of the left camera w.r.t. right. This is what I get.

R in euler angles: [-0.00462468, -0.0277675, 0.0017928]
t matrix: [-0.999999598978524; -0.0002907901840156801; -0.0008470441900959029]

Which is right, because the displacement is only along the X axis in the camera frame. For the second pair, the left camera is now at (1,1,2) (moved upwards by 1m).

enter image description here enter image description here

Now the R and t of left w.r.t. right become:

R in euler angles: [0.0311084, -0.00627169, 0.00125991]
t matrix: [-0.894611301085138; -0.4468450866008623; -0.0002975759140359637]

Which again makes sense: there is no rotation; the displacement along Y axis is half of what the baseline (along X) is, so on, although this t doesn't give me the real metric estimates.

So in order to get metric estimates of pose in case 2, I constructed the 3D points using points from camera 1 and camera 2 in case 1 (taking the known baseline into account: which is 2m), and then ran the PNP algorithm with those 3D points and the image points from case 2. Strangely, both ITERATIVE and EPNP algorithms give me a similar and completely wrong result that looks like this:

Pose according to final PNP calculation is: 
Rotation euler angles: [-9.68578, 15.922, -2.9001]
Metric translation in m: [-1.944911461358863; 0.11026997013253; 0.6083336931263812]

Am I missing something basic here? I thought this should be a relatively straightforward calculation for PNP given that there's no distortion etc. ANy comments or suggestions would be very helpful, thanks!

EDIT: Code for PNP implementation

Let's say pair 1 consists of queryImg1 and trainImg1; and pair 2 consists of queryImg2 and trainImg2 (2d vectors of points). Triangulation with pair 1 results in a vector of 3D points points3D.

  1. Iterate through trainImg1 and see if the same point exists in trainImg2 (because that camera does not move)
  2. If the same feature is tracked in trainImg2, find the corresponding match from queryImg2.
  3. Form vectors P3D_tracked (subset of tracked 3D points), P2D_tracked (subset of tracked 2d points).

    for(int i = 0; i < (int)trainImg1.size(); i++)  
    {
    vector<Point2d>::iterator iter = find(trainImg2.begin(), trainImg2.end(), trainImg1[i]);        
    size_t index = distance(trainImg2.begin(), iter);        
    if(index != trainImg2.size())        
    {
        P3D_tracked.push_back(points3D[i]);        
        P2D_tracked.push_back(queryImg2[index]);   
    }    
    }
    solvePnP(P3D_tracked, P2D_tracked, K, d, rvec, tvec, false, CV_EPNP);
    

For one example I ran, the original set of points had a size of 761, and the no. of tracked features in the second pair was 455.