Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

I have written in this answer some experimentations I did to understand more the concept of homography. Even if this is not really an answer of the original post, I hope it could also be useful to other people and it is a good way for me to summarize all the information I gathered. I have also added the necessary code to check and make the link between the theory and the practice.


What is the homography matrix?

For the theory, just refer to a computer vision course (e.g. Lecture 16: Planar Homographies, ...) or book (e.g. Multiple View Geometry in Computer Vision, Computer Vision: Algorithms and Applications, ...). Quickly, the planar homography relates the transformation between two planes (up to a scale): Homography

This planar transformation can be between:

  • a planar object and the image plane (image from here, p9):

Homography transformation

  • a planar surface viewed by two cameras (image from here, p56 and here, p10):

Homography transformation2-1 Homography transformation2-2

  • a rotating camera around its axis of projection, equivalent to consider that the points are on a plane at infinity (image from here, p11):

Homography transformation3


How the homography can be useful?

  • Camera pose estimation with coplanar points (see here or here, p30), the homography matrix can be estimated using the DLT (Direct Linear Transform) algorithm
  • Perspective removal, correction: Perspective correction
  • Panorama stitching: Panorama stitching

Demo 1: perspective correction

The function findChessboardCorners() returns the chessboard corners location (the left image is the source, the right image is the desired perspective view):

findChessboardCorners

The homography matrix can be estimated with findHomography() or getPerspectiveTransform():

H:
[0.3290339333220102, -1.244138808862929, 536.4769088231476;
 0.6969763913334048, -0.08935909072571532, -80.34068504082408;
 0.00040511729592961, -0.001079740100565012, 0.9999999999999999]

The first image can be warped to the desired perspective view using warpPerspective() (left: desired perspective view, right: left image warped):

warpPerspective


Demo 2: compute the homography matrix from the camera displacement

With the function solvePnP(), we can estimate the camera poses (rvec1, tvec1 and rvec2, tvec2) for the two images and draw the corresponding object frames:

  • Camera pose for the first camera: c1Mo
  • Camera pose for the second camera: c2Mo
  • Homogeneous transformation between the two cameras: c2Mc1

solvePnP

It is then possible to use the camera poses information to compute the homography transformation related to a specific object plane:

Homography Wikipedia

By Homography-transl.svg: Per Rosengren derivative work: Appoose (Homography-transl.svg) CC BY 3.0, via Wikimedia Commons

On this figure, n is the normal vector of the plane and d the distance between the camera frame and the plane along the plane normal. The equation to compute the homography from the camera displacement is:

Homography from camera displacement

Where H_1to2 is the homography matrix that maps the points in the first camera frame to the corresponding points in the second camera frame, R_1to2 is the rotation matrix that represents the rotation between the two camera frames and t_1to2 the translation vector between the two camera frames.

Here the normal vector n is the plane normal expressed in the camera frame 1 and can be computed as the cross product of 2 vectors (using 3 non collinear points that lie on the plane) or in our case directly with:

  cv::Mat normal = (cv::Mat_<double>(3,1) << 0, 0, 1);
  cv::Mat normal1 = R1*normal;

The distance d can be computed as the dot product between the plane normal and a point on the plane or by computing the plane equation and using the D coefficient:

  cv::Mat origin(3,1,CV_64F,cv::Scalar(0));
  cv::Mat origin1 = R1*origin + tvec1;
  double d_inv1 = 1.0 / normal1.dot(-origin1);

The final homography matrix that can be used to warp the first image into the desired perspective view is (the same camera is used in both images here): KHK_inv

cv::Mat homography = cameraMatrix * (R_1to2-d_inv1*tvec_1to2*normal1.t()) * cameraMatrix.inv();
homography /= homography.at<double>(2,2);

The result is:

homography:
[0.416056997554822, -1.306889022302135, 553.7055454434186;
 0.7917584236503302, -0.06341244862332501, -108.2770023399513;
 0.000592635728708199, -0.00102065172420853, 0.9999999999999999]

With the same visual result (left: warp from findHomography(), right: warp from the homography computed from the camera displacement:

warp compare


Demo 3: decompose the homography matrix to a camera displacement

OpenCV 3 contains the function decomposeHomographyMat() which allows to decompose the homography matrix to a set or rotations, translations and plane normals:

  std::vector<cv::Mat> Rs_decomp, ts_decomp, normals_decomp;
  cv::decomposeHomographyMat(homography, cameraMatrix, Rs_decomp, ts_decomp, normals_decomp);

The "correct" results are:

rvec_1to2=[-0.09198300622505946, -0.5372581099787472, 1.310868859706331]
t_1to2=[0.1578091503401751, 0.005603438955404258, 0.1383378923943395]
normal1: [0.1973513036075573, -0.6283452083012302, 0.7524857222361636]

The four solutions are:

Rs_decomp[0]=[-0.09198300622506073, -0.5372581099787442, 1.310868859706334]
ts_decomp[0]=[-0.7747960949402362, -0.0275112223310486, -0.6791979969371286]
normals_decomp[0]=[-0.1973513036075609, 0.6283452083012311, -0.7524857222361622]

Rs_decomp[1]=[-0.09198300622506073, -0.5372581099787442, 1.310868859706334]
ts_decomp[1]=[0.7747960949402362, 0.0275112223310486, 0.6791979969371286]
normals_decomp[1]=[0.1973513036075609, -0.6283452083012311, 0.7524857222361622]

Rs_decomp[2]=[0.1053487857879288, -0.1561929289949728, 1.401356547596018]
ts_decomp[2]=[-0.4666552464032777, 0.1050033058302994, -0.9130076461351245]
normals_decomp[2]=[-0.3131715295480532, 0.842120625125061, -0.4390403692367126]

Rs_decomp[3]=[0.1053487857879288, -0.1561929289949728, 1.401356547596018]
ts_decomp[3]=[0.4666552464032777, -0.1050033058302994, 0.9130076461351245]
normals_decomp[3]=[0.3131715295480532, -0.842120625125061, 0.4390403692367126]

According to the documentation:

At least two of the solutions may further be invalidated if point correspondences are available by applying positive depth constraint (all points must be in front of the camera).

The translation is recovered up to a scale factor (same conclusion in this post) that corresponds in fact to the distance d. All the four solutions provide here a visually correct warping:

cv::Mat homography_decomp_original = computeHomography(Rs_decomp[i], ts_decomp[i], -1.0, normals_decomp[i]); //formula to compute H from the camera displacement
cv::Mat homography_decomp = cameraMatrix * homography_decomp_original * cameraMatrix.inv();
homography_decomp /= homography_decomp.at<double>(2,2);

The homography matrix reconstructed for the first solution is:

homography_decomp:
[0.4160569975548221, -1.306889022302135, 553.7055454434186;
 0.7917584236503303, -0.06341244862332487, -108.2770023399513;
 0.0005926357287081991, -0.00102065172420853, 1]

I have written in this answer some experimentations I did to understand more the concept of homography. Even if this is not really an answer of the original post, I hope it could also be useful to other people and it is a good way for me to summarize all the information I gathered. I have also added the necessary code to check and make the link between the theory and the practice.


What is the homography matrix?

For the theory, just refer to a computer vision course (e.g. Lecture 16: Planar Homographies, ...) or book (e.g. Multiple View Geometry in Computer Vision, Computer Vision: Algorithms and Applications, ...). Quickly, the planar homography relates the transformation between two planes (up to a scale): Homography

This planar transformation can be between:

  • a planar object and the image plane (image from here, p9):

Homography transformation

  • a planar surface viewed by two cameras (image from here, p56 and here, p10):

Homography transformation2-1 Homography transformation2-2

  • a rotating camera around its axis of projection, equivalent to consider that the points are on a plane at infinity (image from here, p11):

Homography transformation3


How the homography can be useful?

  • Camera pose estimation with coplanar points (see here or here, p30), the homography matrix can be estimated using the DLT (Direct Linear Transform) algorithm
  • Perspective removal, correction: Perspective correction
  • Panorama stitching: Panorama stitching

Demo 1: perspective correction

The function findChessboardCorners() returns the chessboard corners location (the left image is the source, the right image is the desired perspective view):

findChessboardCorners

The homography matrix can be estimated with findHomography() or getPerspectiveTransform():

H:
[0.3290339333220102, -1.244138808862929, 536.4769088231476;
 0.6969763913334048, -0.08935909072571532, -80.34068504082408;
 0.00040511729592961, -0.001079740100565012, 0.9999999999999999]

The first image can be warped to the desired perspective view using warpPerspective() (left: desired perspective view, right: left image warped):

warpPerspective


Demo 2: compute the homography matrix from the camera displacement

With the function solvePnP(), we can estimate the camera poses (rvec1, tvec1 and rvec2, tvec2) for the two images and draw the corresponding object frames:

  • Camera pose for the first camera: c1Mo
  • Camera pose for the second camera: c2Mo
  • Homogeneous transformation between the two cameras: c2Mc1

solvePnP

It is then possible to use the camera poses information to compute the homography transformation related to a specific object plane:

Homography Wikipedia

By Homography-transl.svg: Per Rosengren derivative work: Appoose (Homography-transl.svg) CC BY 3.0, via Wikimedia Commons

On this figure, n is the normal vector of the plane and d the distance between the camera frame and the plane along the plane normal. The equation equation to compute the homography from the camera displacement is:

Homography from camera displacement

Where H_1to2 is the homography matrix that maps the points in the first camera frame to the corresponding points in the second camera frame, R_1to2 is the rotation matrix that represents the rotation between the two camera frames and t_1to2 the translation vector between the two camera frames.

Here the normal vector n is the plane normal expressed in the camera frame 1 and can be computed as the cross product of 2 vectors (using 3 non collinear points that lie on the plane) or in our case directly with:

  cv::Mat normal = (cv::Mat_<double>(3,1) << 0, 0, 1);
  cv::Mat normal1 = R1*normal;

The distance d can be computed as the dot product between the plane normal and a point on the plane or by computing the plane equation and using the D coefficient:

  cv::Mat origin(3,1,CV_64F,cv::Scalar(0));
  cv::Mat origin1 = R1*origin + tvec1;
  double d_inv1 = 1.0 / normal1.dot(-origin1);

The final homography matrix that can be used to warp the first image into the desired perspective view is (the same camera is used in both images here): KHK_inv

cv::Mat homography = cameraMatrix * (R_1to2-d_inv1*tvec_1to2*normal1.t()) * cameraMatrix.inv();
homography /= homography.at<double>(2,2);

The result is:

homography:
[0.416056997554822, -1.306889022302135, 553.7055454434186;
 0.7917584236503302, -0.06341244862332501, -108.2770023399513;
 0.000592635728708199, -0.00102065172420853, 0.9999999999999999]

With the same visual result (left: warp from findHomography(), right: warp from the homography computed from the camera displacement:

warp compare


Demo 3: decompose the homography matrix to a camera displacement

OpenCV 3 contains the function decomposeHomographyMat() which allows to decompose the homography matrix to a set or rotations, translations and plane normals:

  std::vector<cv::Mat> Rs_decomp, ts_decomp, normals_decomp;
  cv::decomposeHomographyMat(homography, cameraMatrix, Rs_decomp, ts_decomp, normals_decomp);

The "correct" results are:

rvec_1to2=[-0.09198300622505946, -0.5372581099787472, 1.310868859706331]
t_1to2=[0.1578091503401751, 0.005603438955404258, 0.1383378923943395]
normal1: [0.1973513036075573, -0.6283452083012302, 0.7524857222361636]

The four solutions are:

Rs_decomp[0]=[-0.09198300622506073, -0.5372581099787442, 1.310868859706334]
ts_decomp[0]=[-0.7747960949402362, -0.0275112223310486, -0.6791979969371286]
normals_decomp[0]=[-0.1973513036075609, 0.6283452083012311, -0.7524857222361622]

Rs_decomp[1]=[-0.09198300622506073, -0.5372581099787442, 1.310868859706334]
ts_decomp[1]=[0.7747960949402362, 0.0275112223310486, 0.6791979969371286]
normals_decomp[1]=[0.1973513036075609, -0.6283452083012311, 0.7524857222361622]

Rs_decomp[2]=[0.1053487857879288, -0.1561929289949728, 1.401356547596018]
ts_decomp[2]=[-0.4666552464032777, 0.1050033058302994, -0.9130076461351245]
normals_decomp[2]=[-0.3131715295480532, 0.842120625125061, -0.4390403692367126]

Rs_decomp[3]=[0.1053487857879288, -0.1561929289949728, 1.401356547596018]
ts_decomp[3]=[0.4666552464032777, -0.1050033058302994, 0.9130076461351245]
normals_decomp[3]=[0.3131715295480532, -0.842120625125061, 0.4390403692367126]

According to the documentation:

At least two of the solutions may further be invalidated if point correspondences are available by applying positive depth constraint (all points must be in front of the camera).

The translation is recovered up to a scale factor (same conclusion in this post) that corresponds in fact to the distance d. All the four solutions provide here a visually correct warping:

cv::Mat homography_decomp_original = computeHomography(Rs_decomp[i], ts_decomp[i], -1.0, normals_decomp[i]); //formula to compute H from the camera displacement
cv::Mat homography_decomp = cameraMatrix * homography_decomp_original * cameraMatrix.inv();
homography_decomp /= homography_decomp.at<double>(2,2);

The homography matrix reconstructed for the first solution is:

homography_decomp:
[0.4160569975548221, -1.306889022302135, 553.7055454434186;
 0.7917584236503303, -0.06341244862332487, -108.2770023399513;
 0.0005926357287081991, -0.00102065172420853, 1]

Note: there is a minor difference between the Wikipedia source and the reference paper of decomposeHomographyMat() (Deeper understanding of the homography decomposition for vision-based control):

  • H = R - tn/d on Wikipedia but H = R + tn/d in the paper

It looks like it is just a difference between my understanding or the convention used (maybe in the computation/sign of d?), to be checked.