First use of UMat does not run on GPU?

asked 2015-10-10 08:16:05 -0600

blDraX gravatar image

updated 2015-10-11 02:21:29 -0600

I experience some very strange behaviour using UMats. My intent is to speedup my algorithms by running OpenCV library functions on my GPU (AMD HD 7850, OpenCL capable). In order to test this I load a set of seven images and perform a bilateral filter or a sobel operation on them.

However, it seems that every time I use one of those functions with a new set of parameters it is executed on the CPU first. Only starting from the second use of those same parameters my program uses the GPU. I compiled this with VS 2013 and OpenCV 3.0 gold.

For example, using the same bilateral filter on all images:

#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/core/ocl.hpp>

#include <iostream>
#include <chrono>

using namespace std;
using namespace cv;

int main()
{
    cout << "Have OpenCL?: " << cv::ocl::haveOpenCL() << endl; // returns true, OCL available
    ocl::setUseOpenCL(true);

    // Load images test1.jpg, ..., test7.jpg
    vector<UMat> images;
    for (int i = 1; i <= 7; i++)
    {
        string filename = "test" + to_string(i) + ".jpg";
        UMat input = imread(filename).getUMat(ACCESS_READ);
        images.push_back(input);
    }

    for (int i = 0; i < 7; i++)
    {
        chrono::high_resolution_clock::time_point begin = chrono::high_resolution_clock::now();

        // ---------------------- Critical section --------------------------
        UMat result;
        bilateralFilter(images.at(i), result, 0, 10, 3);
        // ------------------------------------------------------------------

        chrono::high_resolution_clock::time_point end = chrono::high_resolution_clock::now();
        long long ms = chrono::duration_cast<chrono::milliseconds>(end - begin).count();
        cout << ms << " ms" << endl;
    }
}

Output:

2251 ms
5 ms
5 ms
5 ms
5 ms
5 ms
5 ms

The GPU utilization goes up, however only after about 2 seconds (i.e. after the first iteration is completed). However, when using a different set of parameters each time:

        // ...
        UMat result;
        bilateralFilter(images.at(i), result, 0, i * 10, 3);
        // ...

Output:

2148 ms
2098 ms
1803 ms
1699 ms
1826 ms
1760 ms
1766 ms

And all of it is executed on my CPU. Also, those functions run extremely slow. Using Mat instead of UMat only takes these operations about 40ms. I guess there's some crosstalk between the program and OpenCL until the library decides to use the CPU.

The same behaviour shows when using Sobel:

        // ...
        UMat result;
        if (i == 0)
            cv::Sobel(images.at(i), result, CV_32F, 1, 0, 5);
        else if (i == 1)
            cv::Sobel(images.at(i), result, CV_32F, 0, 1, 5);
        else
            cv::Sobel(images.at(i), result, CV_32F, 1, 1, 5);
        // ...

The first three operations are executed on the CPU. Then, iterations 4 to 7 finish on the GPU almost immediately, with the GPU utilization once again going up (because they use the same parameter set as iteration 3). Output:

687 ms
567 ms
655 ms
0 ms
0 ms
1 ms
0 ms

Is this a bug? Am I doing something wrong? Just applying each operation once at the start of the program in order to prevent this feels very hacky. Also I don't know how long the parameter usages are "cached" (I use this word since I have no idea what happens in the background ... (more)

edit retag flag offensive close merge delete

Comments

I have tested your program with VS2013 and opencv 3.0-dev . I haven't got any problem. May be you should change your code with this :

for (int i = 0; i <7; i++)
{
    string filename = "test" + to_string(i) + ".jpg";
    UMat input;
    input=UMat::zeros(2048+256*i,2048,CV_8UC1);
    randn(input,0,3);
    images.push_back(input);
}

I think anybody would be able to test.

My result is

*Have OpenCL?: 1

[2048 x 2048] 15 ms

[2048 x 2304] 15 ms

[2048 x 2560] 15 ms

[2048 x 2816] 15 ms

[2048 x 3072] 31 ms

[2048 x 3328] 15 ms

[2048 x 3584] 31 ms*

LBerger gravatar imageLBerger ( 2015-10-10 12:58:17 -0600 )edit

Very strange... how does this even happen? I also compiled the program with VS2013. I used OpenCV 3.0 gold. I also tested the program on my desktop pc and my notebook and on both I get the same result.

I also tested your suggestion but it didn't change anything. The problem don't seem to be the images. The problem is always the first use of a library function with a specific set of parameters.

I'll try it on another different pc later.

blDraX gravatar imageblDraX ( 2015-10-11 02:28:31 -0600 )edit

A difference VS2013 opencv 3.0 but GeForce GTX 970. May be you can add this lines only to check other config parameters :

cv::ocl::Context context;
if (!context.create(cv::ocl::Device::TYPE_GPU))
{
    cout << "Failed creating the context..." << endl;
    return 0;
}
cout << context.ndevices() << " GPU devices are detected." << endl;
for (int i = 0; i < context.ndevices(); i++)
{
    cv::ocl::Device device = context.device(i);
    cout << "name                 : " << device.name() << endl;
    cout << "available            : " << device.available() << endl;
    cout << "imageSupport         : " << device.imageSupport() << endl;
    cout << "OpenCL_C_Version     : " << device.OpenCL_C_Version() << endl;
    cout << endl;
}
LBerger gravatar imageLBerger ( 2015-10-11 02:45:18 -0600 )edit

Output:

1 GPU devices are detected.
name                 : Pitcairn
available            : 1
imageSupport         : 1
OpenCL_C_Version     : OpenCL C 1.2

I don't really see a problem unfortunately. Do you?

blDraX gravatar imageblDraX ( 2015-10-11 03:10:38 -0600 )edit

No but you can try debug mode to understand what happens

LBerger gravatar imageLBerger ( 2015-10-11 03:23:12 -0600 )edit

So I tracked the problem through the debugger.

During the creation of the Kernel in ocl_bilateralFilter_8u (smooth.cpp, line 2965; Kernel is built in line 3030), OpenCV seems to build the OpenCL Program in ocl.cpp, line 3234.

The problem finally occurs in getProg (ocl.cpp, line 2580) in the line

Program prog(src, buildflags, errmsg); //ocl.cpp, line 2589

This finally calls

            retval = clBuildProgram(handle, n,
                                (const cl_device_id*)deviceList,
                                buildflags.c_str(), 0, 0);

in ocl.cpp, line 3499. The debugger doesn't let me go deeper but this operation is what takes up all the time. Since I can't go into the function I have no idea what to do or what's going wrong.

blDraX gravatar imageblDraX ( 2015-10-11 04:33:45 -0600 )edit

I think It's take time to build opencl program on your graphics card. srclen value is 2444 (line 3484 of ocl.cpp)

LBerger gravatar imageLBerger ( 2015-10-11 05:15:29 -0600 )edit