Afternoon, all!
I have been banging my head against the problem of building a 3D structure from a set of sequential images intently for the past week or so and cannot seem to get a decent result out of it. I would greatly appreciate someone taking the time to go over my steps and let me know if they seem correct. I feel like I am missing something small but fundamental.
1. Build camera calibration matrix K and distortion coefficients from the calibration data of the chessboard provided (using findChessboardCorners(), cornerSubPix(), and calibrateCamera()).
2. Pull in the first and third images from the sequence and undistort them using K and the distortion coefficients.
3. Find features to track in the first image (using goodFeaturesToTrack() with a mask to mask off the sides of the image).
4. Track the features in the new image (using calcOpticalFlowPyrLK()).
At this point, I have a set of point correspondences in image i0 and image i2.
5. Generate the fundamental matrix F from the point correspondences (using the RANSAC flag in findFundamentalMat()).
6. Correct the matches of the point correspondences I found earlier using the new F (using correctMatches()).
From here, I can generate the essential matrix from F and K and extract candidate projection matrices for the second camera.
7. Generate the essential matrix E using E = K^T * F * K per HZ
8. Use SVD on E to get U, S, and V, which then allow me to build the two candidate rotations and two candidate translations.
9. For each candidate rotation, check to ensure the rotation is right-handed by checking sign of determinant. If <0, multiply through by -1.
Now that I have the 4 candidate projection matrices, I want to figure out which one is the correct one.
10. Normalize the corrected matches for images i0 and i2
11. For each candidate matrix:
11.1. Triangulate the normalized correspondences using P1 = [ I | 0 ] and P2 = candidate matrix using triangulatePoints().
11.2. Convert the triangulated 3D points out of homogeneous coordinates.
11.3. Select a test 3D point from the list and apply a perspective transformation to it using P2 (converted to a 4x4 matrix instead of 3x4 where the last row is [0,0,0,1]) using perspectiveTransform().
11.4. Check if the depth of the 3D point and the Z-component of the perspectively transformed homogeneous point are both positive. If so, use this candidate matrix as P2. Else, continue.
12. If none of the candidate matrices generate a good P2, go back to step 5.
Now I should have two valid projection matrices P1 = [ I | 0 ] and P2 derived from E. I want to then use these matrices to triangulate the point correspondences I found back in step 4.
13. Triangulate the the normalized correspondence points using P1 and P2
14. Convert from homogeneous coordinates to get the real 3D points.
I already have encountered a problem here in that the 3D points I triangulate NEVER seem to correspond to the original structure. From the mug, they don't seem to form a clear surface, and from the statue, they're either scattered or on some line that goes off towards [-∞, -∞, 0] or similar. I am using Matplotlib's Axes3D scatter() method to plot them and see the same results with Matlab, so I assume it's not an issue with the visualization so much as the points. Any advice or insight just at this point alone would be hugely appreciated.
Moving forward though, it gets a little fuzzy in that I am not completely sure how to go about adding the additional frames. Below is my algorithm so far:
1. Store image i2 as the previous image, the image points from i2 as the previous image points, the triangulated 3D points as the corresponding real points, and the projection matrix P2 as the previous P for the loop below.
2. For each next frame iNext:
2.1. Undistort iNext using K and the distortion coefficients
2.2. Track the points from the previous image (in the first loop iteration, I use the points from i2) in the new image to get correspondences.
2.3. Normalize the newly tracked points.
2.4. Use the PerspectiveNPlace algorithm from OpenCV (solvePnPRansac()) with the previous 3D points I found before and the normalized points I tracked in the new frame to get the rotation and translation vector of the new camera position relative to the previous one along with a set of inliers.
2.5. Store the inlier 3D points and image points from iNext
2.6. Find new features to track in the previous image
2.7. Track the new features into the current image to get a new set of correspondences
2.8. Correct and normalize the correspondences
2.9. Triangulate the corrected and normalized correspondences to get a new set of 3D points (I do this to account for issues where the original 3D points from the first triangulation in step 14 become occluded).
2.10. Add the list of new 3D and 2D points to the inlier 3D and 2D points from step 2.5.
2.11. Repeat
3. After all of this, I will have built up a listing of 3D points found from the first triangulation between i0 and i2 and from the inliers of solvePnPRansac().
Unfortunately, the 3D points show nothing in the way of any structure, so I feel like this process of adding new images is wrong...
Any insight would be greatly appreciated, but thanks for taking the time to look over this email either way.
-Cody