SURF with CUDA is not faster by a noticeable amount

asked 2018-05-16 03:34:49 -0500

I wrote a program to compare the increase in performance between SURF with CUDA and without it. The implementation with CUDA only resulted in a speed up of 1.25x times.

The following is my code.

while (true) 

        float t1,t2;

        t1 = getTickCount();

        Ptr<xfeatures2d::SURF> surf = xfeatures2d::SURF::create();
        Mat descriptor_1,descriptor_2;
        vector<KeyPoint> keypoint_1,keypoint_2;

        Ptr<DescriptorMatcher> dscMatcher = DescriptorMatcher::create("BruteForce");
        vector< vector< DMatch > >  matches;


        t1 = getTickCount() - t1;

        t2 = getTickCount();
        cuda::SURF_CUDA surf_cuda ;

        cuda::GpuMat key_1_GPU,key_2_GPU,desc_1_GPU,desc_2_GPU,img_1,img_2;


        Ptr<cuda::DescriptorMatcher> dscMatcher1 = cuda::DescriptorMatcher::createBFMatcher();
        vector< vector< DMatch > >  matches1;       
        t2 = getTickCount() - t2;

        cout << "No cuda : " << t1/getTickFrequency() << "    With Cuda : "<< t2/getTickFrequency() << endl;

        if (waitKey(30) >= 0)


The output which is the time taken gave the following results on the average. Without CUDA it took 0.54 seconds and with CUDA it took 0.43 seconds. I am implementing the code on a NVidia Jetson TX2. The images that I am processing has a size of 900 x 1440.

Then I proceeded to do the test with ORB which only resulted in a speed up of 1.6 times.

A similar question was asked here, but wasn't answered. link

I am wondering whether the problem is with the code I wrote or is it with the hardware.

Some details

  • OpenCV 3.4
  • CUDA 9.0
edit retag flag offensive close merge delete


only partly related, but if you do not require rotational invariance, try SURF with the UPRIGHT flag. it's almost as fast as ORB, on CPU already.

berak gravatar imageberak ( 2018-05-16 04:30:04 -0500 )edit

So I initialized the pointer like this

Ptr<xfeatures2d::surf> surf = xfeatures2d::SURF::create(100,4,3,false,true);

cuda::SURF_CUDA surf_cuda(100,4,3,false,0.01f,true);

The time for the one without cuda reduced to 0.3 seconds while the one with cuda remained the same

Kenneth Joseph Paul gravatar imageKenneth Joseph Paul ( 2018-05-16 04:38:07 -0500 )edit

ok. maybe i get more bc. using smaller images.

also, just saying, your code measures the matching, too, which also consumes relevant time/computing-power)

berak gravatar imageberak ( 2018-05-16 04:50:21 -0500 )edit

I tried measuring it before the matching part

w/o CUDA : 0.48 s with CUDA: 0.33 s

Could it be a problem with the hardware?

Kenneth Joseph Paul gravatar imageKenneth Joseph Paul ( 2018-05-16 04:56:15 -0500 )edit

sorry, idk.

berak gravatar imageberak ( 2018-05-16 05:00:02 -0500 )edit

is there any standard test for comparing performance of CUDA against CPU?

Kenneth Joseph Paul gravatar imageKenneth Joseph Paul ( 2018-05-16 05:01:37 -0500 )edit