Revision history [back]

Marker cluster pose from a stereo system

Hello,

I'm trying to build a pretty standard stereoscopic marker tracking system but I'm a bit stuck. I'd like to determine the pose of a cluster of markers relative to the left camera but there seems to be an offset on the x (horizontal) axis, some z rotation too.

To show the problem I've calculated the pose relative to each camera separately using a single camera method then used stereo matching and a least-squares fitting method to determine the pose.

Here's an example of the results:

Mono Left  - R: [3.069579943540829; -0.05977419625323342; 0.1544040260297337]
             T: [38.09605707684862; 21.18416077979643; 163.7890550734082]
Mono Right - R: [3.004481122258051; -0.0503887459417317; 0.1974314610837208]
             T: [-40.78674398112144; 22.07558311940088; 167.3692601754227]
Stereo     - R: [3.040724049612813; -0.09136183473500635; 0.3408090688901517]
             T: [24.53711327862293; 21.23019811829612; 161.4103927579672]

Mono Left  - R: [3.058489621212333; -0.05228215250008787; 0.13683463230099]
             T: [38.96282303353249; 22.00846031736913; 186.9398802020772]
Mono Right - R: [2.961243386981001; -0.05200500666843982; 0.1801525489916241]
             T: [-39.02619557523813; 23.22876352563853; 191.2417072658133]
Stereo     - R: [3.045682764179083; -0.08502438636624524; 0.3180723793202949]
             T: [23.8027339111667; 22.44540681050449; 185.6476516191155]

Mono Left  - R: [3.080152177836276; -0.04957840617637835; -0.1266525035770767]
             T: [66.53298620735731; 22.29216647181219; 187.5828556316184]
Mono Right - R: [2.935352608593301; -0.034696380289205; -0.0137351212138292]
             T: [-11.92929769108779; 23.58438750381959; 189.9232143310567]
Stereo     - R: [3.056379378076835; -0.07589959397021091; 0.09381399770267766]
             T: [50.60496929247539; 22.09750040313026; 188.0577998604219]

I remap the images from both cameras then find the four markers in both images. After that I use the following code to estimate the pose of the marker cluster.

        //Calculate disparity
        ARg::Point3dv disparity;

        for(unsigned int i = 0; i < markerLeft->size(); i++) {
            disparity.push_back(cv::Point3d((markerLeft->at(i)).x, (markerLeft->at(i)).y, (markerLeft->at(i)).x - (markerRight->at(i)).x));
        }

        //Transform to depth
        ARg::Point3dv image;
        cv::perspectiveTransform(disparity, image, left.Q());
        //std::cout << "Disparity: " << disparity << "\nImage    : " << image << std::endl;

        //Calculate mean position of observed markers (image) and untransformed model (object) - centred at (0,0,0)
        cv::Scalar imageMean = cv::mean(image);
        cv::Scalar objectMean = cv::mean(object);

        //Long winded conversion from scalar to mat
        double tempI[3][1] = { imageMean(0), imageMean(1), imageMean(2) };
        double tempM[3][1] = { objectMean(0), objectMean(1), objectMean(2) };
        cv::Mat iMean(3, 1, CV_64F, tempI);
        cv::Mat oMean(3, 1, CV_64F, tempM);

        cv::Mat C = cv::Mat::zeros(3, 3, CV_64F);
        cv::Mat A, B;

        //Calculate correlation matrix C = (1/n)SUM{(obji - objMean)*(imgi - imgMean)^T}
        for(unsigned int i = 0; i < image.size(); i++) {

            double As[3][1] = { (object.at(i)).x - objectMean(0),
                (object.at(i)).y - objectMean(1),
                (object.at(i)).z - objectMean(2) };

            double Bs[1][3] = { (image.at(i)).x - imageMean(0),
                (image.at(i)).y - imageMean(1),
                (image.at(i)).z - imageMean(2) };

            A = cv::Mat(3, 1, CV_64F, As);
            B = cv::Mat(1, 3, CV_64F, Bs);

            C += A * B;
        }

        C /= 4;

        //Perform SVD on C
        cv::Mat w, u, vt;

        cv::SVDecomp(C, w, u, vt);

        //Apply Kanatani extension to get rotation
        cv::Mat correct = cv::Mat::eye(3, 3, CV_64FC1);
        correct.at<double>(2,2) = cv::determinant(u*vt);
        cv::Mat R = u * correct * vt;

        cv::invert(R, R);
        cv::Mat T = iMean - R*oMean;

        //Rodrigues for display
        cv::Rodrigues(R, rvec);
        std::cout << "Stereo - R: " << rvec << "\nT: " << T << std::endl;

Any ideas where my error might be coming from? My guess would be the perspectiveTransform, but short of a bad calibration I can't see how.

Thanks for any advice.