OpenCL BruteForceMatcher slow and faulty

asked 2013-01-29 08:29:34 -0500

Notas gravatar image

updated 2013-01-29 08:37:03 -0500

Hello. I want to match pictures that differ great from one another and for that after lots of tests found the combination ORB+FREAK to be the best. I need at least 4000 keypoints to get reliable results across the images.

The problem is the computation time. I want to cut down on it as much as possible and for that looked into OpenCVs OpenCL implementation (since CUDA only works for NVidia cards, it's not an option for me).

However, not only is the BruteForceMatcher slower than the CPU by a factor of 1.8, it also has less matches. And that is something that shouldn't be possible.

My machine specs: Core 2 Duo E7300 @2,66GHz

Geforce 9500GT (not the fastest, but it shouldn't be this much slower!)

Windows XP 32Bit 4GB Ram (not all usable, but for this small application irrelevant) Visual Studio 2010 with compiled OpenCL module in release mode

The following is some example code. Parts of it are from the example in the OpenCV book. The important stuff happens in the main method. Pass the compiled exe two files and it will try to match them and outputs the result. To change between CPU/GPU, (un)comment the #define.

The images I used for comparison are first and second

The CPU version matches in 0.815s and finds 1530 matches.

The GPU version matches in 1.52s and finds 906 matches.

I am using OpenCV 2.4.3.

Why is the BruteForceMatcher_OCL not working correctly? Have others used it and if yes, what are your results? Also, another bug I had: If I used more than 10000 keypoints, my application would crash with CL_OUT_OF_RESOURCES in initialization.cpp for the GPU version. How could the GPU use all of its memory on some small thousands of keypoints? Freak makes them so small that it shouldn't use much memory at all!

#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/nonfree/features2d.hpp>
#include <opencv2/legacy/legacy.hpp>
#include "opencv2/ocl/ocl.hpp"
#include <CL/cl.h>

using namespace cv;
using namespace cv::ocl;
using namespace std;

int ratioTest(std::vector<std::vector<cv::DMatch>>
    &matches) {
        int removed=0;
        // for all matches
        for (std::vector<std::vector<cv::DMatch>>::iterator 
            matchIterator= matches.begin();
            matchIterator!= matches.end(); ++matchIterator) {
                // if 2 NN has been identified
                if (matchIterator->size() > 1) {
                    // check distance ratio
                    if ((*matchIterator)[0].distance/
                        (*matchIterator)[1].distance > 0.7) {
                            matchIterator->clear(); // remove match
                } else { // does not have 2 neighbours
                    matchIterator->clear(); // remove match
        return removed;

cv::Mat ransacTest(
    const std::vector<cv::DMatch>& matches,
    const std::vector<cv::KeyPoint>& keypoints1, 
    const std::vector<cv::KeyPoint>& keypoints2,
    std::vector<cv::DMatch>& outMatches) 
    // Convert keypoints into Point2f   
    std::vector<cv::Point2f> points1, points2;   
    for (std::vector<cv::DMatch>::
        const_iterator it= matches.begin();
        it!= matches.end(); ++it) {
            // Get the position of left keypoints
            float x= keypoints1[it->queryIdx].pt.x;
            float y= keypoints1[it->queryIdx].pt.y;
            points1.push_back(cv::Point2f ...
edit retag flag offensive close merge delete


No one? Should I file a bug report for this? But for me it would be interesting if anyone could first confirm my issue or tell me if I did something wrong somewhere along the way.

Notas gravatar imageNotas ( 2013-02-05 12:31:35 -0500 )edit

Hi, I am experiencing maybe a similar problem, also can't find an answer. Have you considered whether ORB_GPU is producing the same quality of keypoints/descriptors as the CPU version? In my case I have a simple if/then to compute kp/dsc either with ORB_GPU or ORB, and then my own custom matching code (CPU-based). Using the CPU ORB I get sufficient matches, but using ORB_GPU I get many fewer.

Anyways, perhaps the matcher is not the problem, perhaps ORB_GPU somehow returns lower-quality kp/dsc.

UPDATE: I have submitted my question, maybe you want to watch it also for answers?

RubeRad gravatar imageRubeRad ( 2013-04-05 12:34:56 -0500 )edit

Wait, reading closer, it looks like you have exactly the opposite problem from me; your test above seems to show kp/dsc computed by the CPU implementation of ORB, and your problem is with the GPU matcher being not as good as the CPU matcher.

Never mind.

RubeRad gravatar imageRubeRad ( 2013-04-05 13:15:52 -0500 )edit