Update:
An other way to see that is the following:
Let retrieve some translation and rotation vectors from solvePnP:
retval, rvec, tvec = cv2.solvePnP(obj_pts, img_pts, cam_mat, dist_coeffs, rvec, tvec, flags=cv2.SOLVEPNP_ITERATIVE)
Then, let rebuild the rotation matrix from the rotation vector:
rmat = cv2.Rodrigues(rvec)[0]
Projection matrix:
And finally create the projection matrix as P = [ R | t ] with an extra line of [0, 0, 0, 1] to be square:
P = np.zeros((4,4))
P[:3,:3] = rmat
P[:3,3] = tvec.T # need to transpose tvec in order to fit with destination shape!
P[3,3] = 1
print P
[[ 6.08851883e-01 2.99048587e-01 7.34758006e-01 -4.75705058e+01]
[ 6.78339121e-01 2.83943605e-01 -6.77666634e-01 -3.24002911e+01]
[ -4.11285086e-01 9.11013706e-01 -2.99767575e-02 2.24834560e+01]
[ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
If I understand, this matrix (does it have a name?) in addition to the camera intrinsic parameters matrix, brings points from the world reference frame to the camera reference frame.
Checking:
It is easily checked by drawing projected points on the original image:
projected_points = [np.dot(np.dot(cam_mat,P[:3]),op)/np.dot(np.dot(cam_mat,P[:3]),op)[2] for op in obj_pts]
where cam_mat
is the intrinsic parameters matrix of the camera (basically with focal on the two first element of the diagonal and center coordinates in the two first element of the third column).
And where obj_pts
is an array of points coordinates expressed in the world reference frame, and in homogeneous coordinates, like this for example: [ 10. , 60. , 0. , 1. ]
.
Projected points may then be drawn on image:
[cv2.circle(img,tuple(i),10,(0,0,255),-1) for i in projected_points.tolist()]
It works well. Projected points are near the original points.
Inverse of projection matrix:
Then, here is the inverse transformation obtained by takine the inverse of the projection matrix:
P_inv = np.linalg.inv(P)
print P_inv
[[ 6.08851883e-01 6.78339121e-01 -4.11285086e-01 6.01888871e+01]
[ 2.99048587e-01 2.83943605e-01 9.11013706e-01 2.94301145e+00]
[ 7.34758006e-01 -6.77666634e-01 -2.99767575e-02 1.36701949e+01]
[ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00]]
Here we can see the camera position expressed in the world reference frame as the last column: X=60.19, Y=29.43, Z=13.67.
This is exactly the same as the following value:
cam_pos = -np.matrix(rmat).T * np.matrix(tvec)
print cam_pos
[[ 60.18888712]
[ 2.94301145]
[ 13.67019486]]
And in addition, the upper 3x3 part of P_inv
is the rotation matrix that should bring the points from the camera reference frame to the world reference frame. Which is exactly the same as the transpose of rmat
, so:
print Pi[:3,:3]/rmat.T # result of the division of the 3x3 upper part of the inverted projection matrix by rmat.T. Should be only ones.
[[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]]
Up there, all is what I expected.
The cv2.decomposeProjectionMatrix() function:
Now, let have a look the the cv2.decomposeProjectionMatrix()
function which should basically do the same job.
Let feed this function with our projection matrix (this function needs it to be reshaped to 3x4):
decomposition = cv2.decomposeProjectionMatrix(P[:3])
As the doc says:
decomposition[0]
should be the estimated camera matrix
decomposition[1]
should be the rotation matrix
decomposition[2]
should be the translation vector
decomposition[3]
to decomposition[5]
should be the 3 individual rotation matrices around x,y,z camera axes (here comes the previoulsy described issue, op top of this post...).
and decomposition[6]
should be the euler angles (around x,y,z camera axes).
Let check that:
print decomp[1]/rmat
[[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]]
Ok; so here, we find the same rotation matrix as the input rotation matrix (upper 3x3 part of our projection matrix P). This would have given exactly the same result: print (decomp[1]/P[:,:3][:3])
Next, the translation (hint: it should absolutely be normalized by the last component which is a scale factor):
print (decomp[2][:3]/decomp[2][3])/tvec
[[-1.2652564 ]
[-0.09083287]
[ 0.60801128]]
Oh oh... What's wrong? Here is something I don't understand; I would have expected [1,1,1] as for the rotation matrix.
By having a glace at decomp[2] one can see it doesn't look like the input translation vector (same behavior as for the rotation part), rather it looks like the camera position, and in fact, it is (!):
print (decomp[2][:3]/decomp[2][3])/cam_pos
[[ 1.]
[ 1.]
[ 1.]]
And of course, on the inverted projection matrix P_inv
the same does apperear in the reverse direction (I would have expected the camera position here, but it's tvec !):
print (decomp_inv[2][:3]/decomp_inv[2][3])/tvec
[[ 1.]
[ 1.]
[ 1.]]
Summary:
Shortly:
print (np.hstack((rmat, tvec)))/(np.hstack((decomp[1],decomp[2][:3]/decomp[2][3])))
[[ 1. 1. 1. -0.79035364]
[ 1. 1. 1. -11.00923039]
[ 1. 1. 1. 1.64470633]]
with rmat
and tvec
the inputs that forms the projection matrix P
given to the cv2decomposeProjectionMatrix()
function.
And:
print (np.hstack((rmat.T, cam_pos)))/(np.hstack((decomp_inv[1],decomp_inv[2][:3]/decomp_inv[2][3])))
[[ 1. 1. 1. -1.2652564 ]
[ 1. 1. 1. -0.09083287]
[ 1. 1. 1. 0.60801128]]
with rmat.T
and cam_pos
the inputs that forms the projection matrix P_inv
given to the cv2decomposeProjectionMatrix()
function.
Shouldn't there be "ones" everywhere?!