Ask Your Question

OpenCV 3.0.0 and OpenCL benchmark: Sobel edge detection

asked 2015-03-25 09:30:00 -0500

updated 2015-03-25 13:33:16 -0500

I am trying to understand the potential of OpenCL module of OpenCV 3.0.0.

This is related to a previous question of mine, where I understood that there might not exist many answers about this topic just yet. So I decided to ask you guys to help me benchmarking a piece of code.

I wrote an easy to compile simple code snippet that uses a webcam stream and finds Sobel edges and then blurs image some dozens of times. I'm blurring the image multiple times to create a strain on the processor, but feel free to change that part of the code, what is important is the comparison between OpenCL and non-OpenCL versions.

EDIT: Edited code to correct mistake in FPS calculation


#include <iostream>
#include <ctime>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/core/ocl.hpp>

using namespace std;

int main()
    cout << "Have OpenCL?: " << cv::ocl::haveOpenCL() << endl;

/*                                      */
    int nBlurs = 50;

    cv::VideoCapture cam;

    if (!
        std::cout << "Problem connecting to cam " << std::endl;
        std::cout << "Successfuly connected to camera " << std::endl;

    long frameCounter = 0;

    std::time_t timeBegin = std::time(0);
    int tick = 0;

    cv::UMat frame;
    cv::UMat frameGray;
    cv::UMat frameSobelx;
    cv::UMat frameSobely;

    cv::UMat frameSobel;
    cv::UMat blurredSobel;

    while (1)
        cv::cvtColor(frame, frameGray, cv::COLOR_BGR2GRAY);

        cv::Sobel(frameGray, frameSobelx, frameGray.depth(), 1, 0, 3);
        cv::Sobel(frameGray, frameSobely, frameGray.depth(), 0, 1, 3);

        cv::bitwise_or(frameSobelx, frameSobely, frameSobel);

        for (int n = 0; n < nBlurs; n++)
            cv::blur(frameSobel, blurredSobel, cv::Size(3,3));

        cv::imshow("Sobel blurred Frame", blurredSobel);


        std::time_t timeNow = std::time(0) - timeBegin;

        if (timeNow - tick >= 1)
            cout << "Frames per second: " << frameCounter << endl;
            frameCounter = 0;

    return 0;

My System

Processor: Intel Core 2 duo 3GHZ

Grahics Card: Nvidia GeForce GT 220

OS: Linux Gentoo

My Results

With OpenCL: 21 FPS, 180% CPU usage and 88% GPU usage

Without OpenCL: 26 FPS, 98% CPU usage and 5% GPU usage

This result is extremely strange.. how come that with OpenCL the GPU is working at 88% capacity and the CPU is at 150%? And, on top of that? It is slower than the non-GPU version?

I'm trying to figure out if this is a hardware problem. Could anyone please compile and make this test?

OPENCL image description

NO-OPENCL image description

Best regards!

edit retag flag offensive close merge delete



@Pedro Batista I am facing the same behaviour here with an ati 5850 radeon graphics card, core2quad 2.6Ghz, archlinux OS, Opencv v.3.0.0 master branch from github and mesa/radeon open source drivers with opencl enabled.

theodore gravatar imagetheodore ( 2015-03-25 12:06:46 -0500 )edit

Wait, i've just realized that the FPS calculation is wrong. I'll update this question soon

Pedro Batista gravatar imagePedro Batista ( 2015-03-25 12:09:28 -0500 )edit

Can you please remove the cv::UMat declarations from while loop and declare it outside. Preferably with allocating its size and type before going into loop. This is the only idea I have at this moment. Normally the compiler will do this to optimize its code, but perhabs it doesn't.

matman gravatar imagematman ( 2015-03-25 13:03:39 -0500 )edit

Edited the question to change the code. Results are the same (also tried declaring UMat as pointers and allocating space from them)

Did you test this code, @matman?

EDIT: Actually results aren't the same... the FPS are still 21, but the CPU usage raises to 180%. This is becoming more and more confusing..

Pedro Batista gravatar imagePedro Batista ( 2015-03-25 13:27:32 -0500 )edit

At the moment I have no PC with GPU acceleration. At work I had testet a loop just with add, subtract, multiply and divide for CPU (Core 2 Quad) OpenCL and Cuda (GLX280 (not sure)), where OpenCL and Cuda was about 50 times faster than CPU. I'm interestet in comparing CPU and GPU, but at the moment theres no time for more testing. Perhabs I can test your code tomorrow.

matman gravatar imagematman ( 2015-03-25 13:45:25 -0500 )edit

I did a quick test using the pre-build OpenCV 3.0.0 Beta, Windows platform, VS2010+Release mode. Unfortunately, I experience the same issue:

  • CPU=~10 FPS (only 1/8 core used) It is strange that I have worst performance whereas I have a Core i7 ?
  • GPU=~10FPS (same CPU load) (960 CUDA cores)

It doesn't seem to be an hardware problem as some other person have the same issue. Maybe it is a problem with the beta version ? Maybe it is a problem related with the application (we should try to filter an Hi-res image ?

I have an integrated graphic chipset (white/red light when the Intel/GPU chipset is used). No matter if I change setUseOpenCL to true or false, I have a red light showing that the GPU is used. I use GPU-Z to see the GPU load and it is in fact the Intel chipset ...(more)

Eduardo gravatar imageEduardo ( 2015-03-25 17:16:53 -0500 )edit

I am really starting to believe that it is a problem with opencv 3.0.0 opencl module, since everyone is getting these strange results in different systems, and you showed in your answer that the same code for opencv 2.4 is being correctly accelerated.

Btw, is indeed strange you only get 10 FPS in your i7 processor.

Pedro Batista gravatar imagePedro Batista ( 2015-03-26 05:13:51 -0500 )edit

I looked closer to your results:

Still I don't have a complete answer to explain the issue.

Still strange results for theodore...

Eduardo gravatar imageEduardo ( 2015-03-26 20:57:33 -0500 )edit

@Eduardo when I find some time I will try your example and see how it performs and provide a feedback

theodore gravatar imagetheodore ( 2015-03-26 21:10:12 -0500 )edit

@Eduardo Even a low performance video card should outperform the CPU: I never tried OpenCL on OpenCV, but with direct OpenCL programming I got consistently 4x better frame rates on a GeForce 550M with 96 cuda cores than a 2.7GHz Core i7 (quad core) processor.

kbarni gravatar imagekbarni ( 2015-03-27 10:08:10 -0500 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2015-03-25 18:21:09 -0500

Eduardo gravatar image

updated 2015-03-27 15:26:50 -0500

I did a quick test with OpenCV 3.0.0 beta and I realised that the OpenCL version used my Intel HD Graphics instead of my NVidia GPU (see my comments).

I tested also with OpenCV 2.4.10 and it is possible to set the device for OpenCL:

This is the code I use to test OpenCL with OpenCV 2.4.10:

#include <iostream>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/ocl/ocl.hpp>

int main(int argc, char**argv)
    int nBlurs = 50;

    bool use_opencl = false;
    if(argc > 1) {
        use_opencl = std::string(argv[1]) == "1";

    std::cout << "use_opencl=" << use_opencl << std::endl;
    cv::VideoCapture cam;

    if (! {
        std::cout << "Problem connecting to cam " << std::endl;
        return -1;
    else {
        std::cout << "Successfuly connected to camera " << std::endl;

    cv::Mat frame;
    cv::Mat frameGray;
    cv::Mat frameSobelx;
    cv::Mat frameSobely;
    cv::Mat blurredSobel;

    char c;
    double total_time = 0.0;
    int cpt = 0;

        double t = (double) cv::getTickCount();
        if(use_opencl) {
            cv::ocl::oclMat frame_ocl, frameGray_ocl, frameSobelx_ocl, frameSobely_ocl, blurredSobel_ocl;

            cv::ocl::cvtColor(frame_ocl, frameGray_ocl, cv::COLOR_BGR2GRAY);

            cv::ocl::Sobel(frameGray_ocl, frameSobelx_ocl, frameGray_ocl.depth(), 1, 0, 3);
            cv::ocl::Sobel(frameGray_ocl, frameSobely_ocl, frameGray_ocl.depth(), 0, 1, 3);

            cv::ocl::bitwise_or(frameSobelx_ocl, frameSobely_ocl, frameGray_ocl);

            for (int n = 0; n < nBlurs; n++) {
                cv::ocl::blur(frameGray_ocl, blurredSobel_ocl, cv::Size(3,3));

            blurredSobel = blurredSobel_ocl;
        } else {
            cv::cvtColor(frame, frameGray, cv::COLOR_BGR2GRAY);

            cv::Sobel(frameGray, frameSobelx, frameGray.depth(), 1, 0, 3);
            cv::Sobel(frameGray, frameSobely, frameGray.depth(), 0, 1, 3);

            cv::bitwise_or(frameSobelx, frameSobely, frameGray);

            for (int n = 0; n < nBlurs; n++) {
                cv::blur(frameGray, blurredSobel, cv::Size(3,3));

        t = ((double) cv::getTickCount() - t) / cv::getTickFrequency();
        total_time += t;
        std::cout << "Times passed in seconds: " << t << " ; FPS: " << (1/t) << " ; Average FPS=" << (cpt/total_time) << std::endl;

        cv::imshow("Sobel blurred Frame", blurredSobel);

        c = cv::waitKey(30);
    } while(c != 27);

    return 0;

My result (pass as an argument 1 to use the OpenCL version, otherwise it is the CPU version):

  • CPU version=~24 FPS ; CPU load=~92%
  • GPU version=~70 FPS ; CPU load=~8% ; GPU load=~7%

Also, you can use a program to monitor the GPU load (e.g. GPU-Z). Mine is about 7% when the program is running.

Edit: Additionnal information:

  • Windows platform + Visual Studio 2010 + Release mode
  • to set the GPU device: add the environment variable OPENCV_OPENCL_DEVICE with for example: :GPU:1


I retried with this time OpenCV 3.0.0 from master (as of 26/03/2015) and I built it with VS2010, 64 bits, release mode and without IPP. I manage to have with this configuration coherent results:

  • setUseOpenCL(false): ~24 FPS ; ~8% for CPU load
  • setUseOpenCL(true): ~200 FPS ; ~15% for CPU load ; ~< 10% for GPU load

To calculate the FPS, I measure only the time to process the image (without the time to acquire the image, without the time to display the image). I still have to set the ... (more)

edit flag offensive delete link more



Thanks for your answer. I will test this algorithm in opencv 2.4 today and ill share results later.

Pedro Batista gravatar imagePedro Batista ( 2015-03-26 05:14:28 -0500 )edit

Could you tell me your CPU usage both with OpenCL and Non-OpenCL in this example?

Pedro Batista gravatar imagePedro Batista ( 2015-03-26 06:11:01 -0500 )edit

OpenCV 2.4.10:

Non-OpenCL version: CPU load=~92%

OpenCL version: CPU load=~8%

Eduardo gravatar imageEduardo ( 2015-03-26 08:37:58 -0500 )edit

@Eduardo nice results. Can you tell me how do you set correct device in opencv3.0.0 ?

I'm not sure what changed between the first and the second test, can you explain that? (In opencv 3.0.0)

Pedro Batista gravatar imagePedro Batista ( 2015-03-27 05:05:08 -0500 )edit

To set the correct device, I just add an environment variable with name=OPENCV_OPENCL_DEVICE and value=:GPU:1 (

What changed ? I have 2 graphics chipset on my laptop (one integrated, Intel HD Graphics, and one dedicated NVidia). I think the first try, it used CPU / Intel HD Graphics and the second try (as I setted the env var) it ised CPU / NVidia for non-OpenCL / OpenCL version.

For you, I don't think that setting the device will change something because you have only one graphics chipset (the NVidia card) but you can try it.

Eduardo gravatar imageEduardo ( 2015-03-27 08:55:40 -0500 )edit

@Eduardo ok, I tested the version for the OpenCV v.3.0.0 with IPP enabled and the results have as follows:

  • setUseOpenCL(false): ~22 FPS ; ~6% CPU load and ~11% GPU load
  • setUseOpenCL(true): ~32 FPS ; ~19% CPU load ; ~22% GPU load

Taking into account that my computer is quite old (6-7 years old) I would say that the results seem quite ok.

theodore gravatar imagetheodore ( 2015-03-27 18:24:26 -0500 )edit

@Eduardo Hi, I've tested your code in OpenCV3.0, and got some good results. eg, 2.8 fps in CPU and 40fps in GPU. But when I test the facedetect demo, the performance of OpenCl is bad. It's even slower than CPU. The platform for the two cases is same, that is Intel HD Graphics(the only GPU device on my computer). What leads to the results? And I have another question. After filtering the image on the device, I copy the frame to host then do processing of the frame data. An error is reported, in the function deallocate in ocl.cpp.

Anna Lucia gravatar imageAnna Lucia ( 2015-06-09 00:43:46 -0500 )edit

For the difference of performance, just a guess but I think that it is because the operation involved here (cv::blur) is a task that it is easily parallelizable contrary to CascadeClassifier.

For your last question, I don't know, can you post your code ?

Eduardo gravatar imageEduardo ( 2015-06-09 03:50:23 -0500 )edit

I have same perfomance problems with all code. Using code from starting question I have similar results.

  • setUseOpenCL(false): ~12fps; ~50% load both CPU cores; 0% GPU load;
  • setUseOpenCL(true): ~13fps; ~70% load both CPU cores; 75% GPU load;

I'm also setting OPENCV_OPENCL_DEVICE = :GPU: for the second case. I'm using OpenCV 3.0.0 compiled with mingw on win8.1. CPU: Intel Core 2 Duo E7200; GPU: GeForce 9600GT;

Aleksey Filippov gravatar imageAleksey Filippov ( 2015-07-29 08:20:56 -0500 )edit
Login/Signup to Answer

Question Tools



Asked: 2015-03-25 09:30:00 -0500

Seen: 9,693 times

Last updated: Mar 27 '15