Revision history - OpenCV Q&A Forum

Depends on your application. Do you have a set of fixed markers or a several markers which can move around the scene ?

As you said, solvePNP will give the RT matrix of a camera given the 3D coordinates of some points on the image, and these coordinates have to be known by another method.

For augmented reality with markers, the concept is that you have an idea about the real-world size of the markers a priori, so, for instance, for a 10cm square marker you can say that the coordinates of its corners are (0,0,0), (0.01,0,0), (0,0.01,0), (0.01,0.01,0). Once you have detected it, solvePNP will give you the relative pose of the camera towards this marker.

Note that the RT matrix is the transform that converts absolute world coordinates to relative coordinates towards the camera. So, if the centre of the marker is the position P = (0.05,0.05,0,1.0) (homogeneous coordinates) will be the centre of the marker, and its relative position in relation to the camera will be RT*P. This can be also be used to determine the marker orientation.

Likewise, if you want draw something as overlay over the marker (augmented reality), you can use the coordinates of the marker as the "world coordinates", and render the overlay based in the computed camera pose.

That said, if you have several mobile markers, you have to compute for each marker the relative pose of the camera from it with separated calls of solvePNP.

Note that if the appearance of the markers is known, and you don't have their real-world size, you will have to assign a defined size in an arbitrary unit, since there is a infinite number of possible sizes + 3D positions which will have the same appearance in the camera.

Important: RT is a 4x4 Matrix and P is a 4x1 matrix (x,y,z,w) where w is 1.0 (homogeneous coordinates). Solve PNP will give you the the euler angles R', and a translation matrix T'. You should compute the rotation matrix R (3x3) using cv::Rodrigues. I use the following procedure to compute RT from rvec and tvec from solvePNP :

void RvecTvecToRT(cv::Mat& Rvec, cv::Mat& Tvec, cv::Mat& RT)
{

    RT = cv::Mat::eye(4,4,CV_64F); //identity matrix
    cv::Mat R;
    cv::Rodrigues(Rvec, R);
    //We store the R and T that transforms from World to Camera coordinates
    for(int i = 0; i < 3; i++) {
         RT.at<double>(i,3) = Tvec.at<double>(i,0);
        for(int j = 0; j < 3; j++) {
            RT.at<double>(i,j) = R.at<double>(i,j);
        }
    }

}

Depends on your application. Do you have a set of fixed markers or a several markers which can move around the scene ?

As you said, solvePNP will give the RT matrix of a camera given the 3D coordinates of some points on the image, and these coordinates have to be known by another method.

For augmented reality with markers, the concept is that you have an idea about the real-world size of the markers a priori, so, for instance, for a 10cm square marker you can say that the coordinates of its corners are (0,0,0), ~~(0.01,0,0), (0,0.01,0), (0.01,0.01,0).~~ (0.1,0,0), (0,0.1,0), (0.1,0.1,0). Once you have detected it, solvePNP will give you the relative pose of the camera towards this marker.

Note that the RT matrix is the transform that converts absolute world coordinates to relative coordinates towards the camera. So, if the centre of the marker is the position P = (0.05,0.05,0,1.0) (homogeneous coordinates) will be the centre of the marker, and its relative position in relation to the camera will be RT*P. This can be also be used to determine the marker orientation.

Likewise, if you want draw something as overlay over the marker (augmented reality), you can use the coordinates of the marker as the "world coordinates", and render the overlay based in the computed camera pose.

That said, if you have several mobile markers, you have to compute for each marker the relative pose of the camera from it with separated calls of solvePNP.

Note that if the appearance of the markers is known, and you don't have their real-world size, you will have to assign a defined size in an arbitrary unit, since there is a infinite number of possible sizes + 3D positions which will have the same appearance in the camera.

Important: RT is a 4x4 Matrix and P is a 4x1 matrix (x,y,z,w) where w is 1.0 (homogeneous coordinates). Solve PNP will give you the the euler angles R', and a translation matrix T'. You should compute the rotation matrix R (3x3) using cv::Rodrigues. I use the following procedure to compute RT from rvec and tvec from solvePNP :

void RvecTvecToRT(cv::Mat& Rvec, cv::Mat& Tvec, cv::Mat& RT)
{

    RT = cv::Mat::eye(4,4,CV_64F); //identity matrix
    cv::Mat R;
    cv::Rodrigues(Rvec, R);
    //We store the R and T that transforms from World to Camera coordinates
    for(int i = 0; i < 3; i++) {
         RT.at<double>(i,3) = Tvec.at<double>(i,0);
        for(int j = 0; j < 3; j++) {
            RT.at<double>(i,j) = R.at<double>(i,j);
        }
    }

}

Depends on your application. Do you have a set of fixed markers or a several markers which can move around the scene ?

As you said, solvePNP will give the RT matrix of a camera given the 3D coordinates of some points on the image, and these coordinates have to be known by another method.

For augmented reality with markers, the concept is that you have an idea about the real-world size of the markers a priori, so, for instance, for a 10cm square marker you can say that the coordinates of its corners are (0,0,0), (0.1,0,0), (0,0.1,0), (0.1,0.1,0). Once you have detected it, solvePNP will give you the relative pose of the camera towards this marker.

Note that the RT matrix is the transform that converts absolute world coordinates to relative coordinates towards the camera. So, if the centre of the marker is the position P = (0.05,0.05,0,1.0) (homogeneous coordinates) will be the centre of the marker, and its relative position in relation to the camera will be RT*P. This can be also be used to determine the marker orientation.

Likewise, if you want draw something as overlay over the marker (augmented reality), you can use the coordinates of the marker as the "world coordinates", and render the overlay based in the computed camera pose.

That said, if you have several mobile markers, you have to compute for each marker the relative pose of the camera from it with separated calls of solvePNP.

Note that if the appearance of the markers is known, and you don't have their real-world size, you will have to assign a defined size in an arbitrary unit, since there is a infinite number of possible sizes + 3D positions which will have the same appearance in the camera.

Important: RT is a 4x4 Matrix and P is a 4x1 matrix (x,y,z,w) where w is 1.0 (homogeneous coordinates). Solve PNP will give you the the euler angles R', and a translation matrix T'. You should compute the rotation matrix R (3x3) using cv::Rodrigues. I use the following procedure to compute RT from rvec and tvec from solvePNP :

void RvecTvecToRT(cv::Mat& Rvec, cv::Mat& Tvec, cv::Mat& RT)
{

    RT = cv::Mat::eye(4,4,CV_64F); //identity matrix
    cv::Mat R;
    cv::Rodrigues(Rvec, R);
    //We store the R and T that transforms from World to Camera coordinates
    for(int i = 0; i < 3; i++) {
         RT.at<double>(i,3) = Tvec.at<double>(i,0);
        for(int j = 0; j < 3; j++) {
            RT.at<double>(i,j) = R.at<double>(i,j);
        }
    }

}

Based in your comment, it is very similar with what I had implemented such thing long time ago, using pictures as AR markers.

Basically, as pre-processing step, you have first to compute the keypoints and associated descriptors for each AR marker. That is, for a marker, you will have a set of pairs composed by keypoints and descriptor. For each 2D keypoint, you have to map an 3D coordinate that corresponds to the plane centred in the AR marker (as stated in my previous post ).

That is, for each marker:

Compute keypoints and descriptors (SIFT/SURF/etc.)
Assign for each keypoint a 3D point -for instance, the point (20,20) in 100x100 marker with size of 10cm where its centre is (0.05,0.05,0), it will be (0.02,0.02,0)

Now, for detection you have to compute the scene descriptors, for each marker, match such descriptors with its descriptors.

Compute the scene keypoints and descriptors

For each marker

Match its descriptors with the scene descriptors (use KNN matching, see R. Szeliski book, the PDF is free ! -R. Szeliski web-page )
Compute an homography using RANSAC with the matched descriptors (use cv::findHomography)
If the homography is not found, or you have less than matched points, consider the marker not detected
If it is found, do not use the inliers from the cv::findHomography, instead use it to map the marker's corners into the image
Discard any matched descriptor outside the mapped region
Use solvePNP with the 3D coordinates of the matched points within the are and 2D points of the scene keypoints.
Since the object is planar, it may be possible that the computed R/T will be mirrored giving a transform such that the marker's face is not facing the camera. Flip the transform in order to correct it.

Once you detected them, you have to use some tracking algorithm of the keypoints, such as KLT. It is more complicated, and I suggest to you give a look in the 3D tracking example of the OpenCv documentation.