Let's say I have two cameras adhering to the assumption that at any given moment, there is a finite number of features visible from both. Both of them start off at a certain baseline b between them, displaced in X (opencv convention). And then they are free to move in X, Y, Z and rotate about all axes, but there are features visible from both. Given that I know b, my final goal is to accurately calculate the subsequent positions of both cameras in real time (although for now, I don't need realtime performance)
I have two approaches in mind for this:
Approach 1:
Triangulate the first pair of images (image0_L, image0_R), and pick a pair of points in the 3D set. Calculate the distance between those points and store it as d1. After the cameras have moved a little, we obtain image1_L, image1_R. Triangulate image1_L and image0_R and track that pair of points again and calculate the new distance d2. Dividing d1/d2 should tell us how much the camera1 has moved with respect to camera2. Repeat with image1_R and image0_L for getting camera2's position w.r.t. camera 1. This is pretty computationally expensive, and also feels like a very naive implementation.
Approach 2:
Construct the 3D point cloud from image0_L and image0_R. Match image1_L and image0_R to see how many features from image0_R are still visible. Take only those points that correlate to these "tracked, still visible" features from the original 3D point cloud, use those object points as reference and run PNP algorithm on the image points from image1_R and image1_L to obtain positions of both cameras. This is not as computationally intensive as approach 1, but it's giving me very erratic and unreliable results. I am not really knowledgeable at the theoretical aspects of CV theory, so I feel I am failing to spot a more elegant solution for this. Any suggestions or comments would be very helpful, thanks!