I have a left and right image of a scene, taken with identical cameras. The cameras were placed fairly far apart, about 135cm, and the difference in the angle of their gaze is maybe 30 degrees. I've calibrated the two cameras independently with asymmetric circles, and the resulting values seem sane and can undistort images in a sane way.

There is an object in the images with known dimensions -- it's a table. By hand, I've identified the x,y pixel coordinates of 8 corresponding key points in each image (6 on the table top plane, 2 below in the table's legs). I know the true 3-D coordinates of those 8 points in the scene because I measured them.

How can I use the 2 camera matrices, 2 distortion vectors, 2 vectors of 8 corresponding 2-D points, 1 vector of 8 corresponding 3-D points to arrive at a formula/algorithm to approximate new 3-D points given their 2-D location in each image? I've been testing by trying to recreate the 3-D location of those 8 points, but I plan to use it on new features in phase 2 of this project.

Here's what I've tried so far.

Attempt #1:

stereoCalibrate to get rotation and translation between cameras

stereoRectify to get left and right projection matrices

triangulatePoints using the two projection matrices and the two sets of undistorted points, and convert from homogeneous to "normal" using convertPointsFromHomogeneous

Attempt #2:

solvePnP, independently for left and right, on the 2-D and 3-D points to arrive at rotation and translation

get the relative rotation and translation between the cameras by subtraction one rotation vector from the other and one translation vector from the other (yes, this could easily be wrong)

stereoRectify to get left and right projection matrices

triangulatePoints using the two projection matrices and the two sets of undistorted points, and convert from homogeneous to "normal" using convertPointsFromHomogeneous

Attempt #3:

solvePnP, independently for left and right, on the 2-D and 3-D points to arrive at rotation and translation

undistortPoints on the 2-D points

make a 3x4 projection matrix for left and right as [R | T] (yes, this could easily be wrong but I must have read it somewhere)

triangulatePoints using the two projection matrices and the two sets of undistorted points, and convert from homogeneous to "normal" using convertPointsFromHomogeneous

If I had to guess, I'd say Attempt #1 is the best as it uses the higher level stereoCalibrate and it just so happens that the length of the translation vector is 132cm -- maybe a coincidence, but that is the distance between the cameras.

However, all these attempts (and many minor variations) give 3D answers that seem to be nonsense. For instance, one of the points is given as 50 meters away from the scene. They don't resemble the 3D points used as inputs.

This is my first OpenCV project, so I'm sure I'm doing something foolish. I have done a lot of reading online trying to find an example that works, but nothing yet. I'd really appreciate any guidance.