1 | initial version |
Your assumption concerning the X and Y coordinates of the TVEC is right: If you supply a picture corresponding to a perfectly perpendicular, undistorted view and have the marker centered in the frame, X and Y should yield something close to zero.
It seems that cx and cy in your camera matrix are swapped: As camera sensors formats are usually defined as landscape (h x v), cx should be the bigger value. If I change this in your code, I do get
[-2.276270217259004e-38, 4.629643948357638e-24, 776.3975155294337]
which is pretty much, what you expected.