Camera Pose Estimation With OpenCV and Python
I'm trying to use OpenCV with python to track camera pose in a video stream. I have a sample of code that determines the pose between two images as a test environment.
The overall flow here is this:
Read in the images and convert to gray/resize. Extract features with cv2 goodfeaturestotrack from both images. Use cv2 calcOpticalFlowPyrLK to find matching points. Convert the p1 points (starting image) to (x,y,z) with z for all points set as 0. Resolve cv2 PnPRansac to get the rotation and translation vectors. Convert angles from radians to degrees.
> def function(mtx,dist):
>
>
> #feature dictionary
> feature_params = dict( maxCorners = 100,
> qualityLevel = 0.3,
> minDistance = 7,
> blockSize = 7 )
> lk_params = dict( winSize = (15,15),
> maxLevel = 2,
> criteria = (cv2.TERM_CRITERIA_EPS |
> cv2.TERM_CRITERIA_COUNT, 10, 0.03))
>
>
> #image 1
> image_1=cv2.imread("/Users/johnmcconnell/Desktop/Test_images/Test_image.jpg")
> image_1=cv2.resize(image_1,(640,480))
> gray_1=cv2.cvtColor(image_1,cv2.COLOR_BGR2GRAY)
> p1=cv2.goodFeaturesToTrack(gray, mask = None, **feature_params)
>
> #image read in
> image_2=cv2.imread("/Users/johnmcconnell/Desktop/Test_images/Test_image.jpg")
> image_2=cv2.resize(image_2,(640,480))
> gray_2 = cv2.cvtColor(image_2,cv2.COLOR_BGR2GRAY)
> p2, st, err = cv2.calcOpticalFlowPyrLK(gray_1,
> gray_2, p1, None, **lk_params)
>
> #convert the old points to 3d
> zeros=np.zeros([len(p1),1],dtype=np.float)
> old_3d=np.dstack((p1, zeros))
>
> #get rotational and translation vector
> blank,rvecs, tvecs, inliers = cv2.solvePnPRansac(old_3d, p2, mtx,
> dist)
>
> rad=180/math.pi
>
> roll=rvecs[0]*rad
> pitch=rvecs[1]*rad
> yaw=rvecs[2]*rad
>
> print(roll)
> print(pitch)
> print(yaw)
> print(tvecs)
> function(mtx,dist)
Given the fact that I am using exactly the same image to run this sample calculation I was expecting rotation and translation vectors to be very close to zero. However they are quite high, take a look at the sample output below. Additionally with different images with a known translation the vectors are very wrong.
The question at hand is my method sound? Have I approached this problem right? Have I matched the points correctly? Is this level of noise normal or is there something I can do about it?