OpenCL latency at first call

asked 2016-10-06 04:09:21 -0500

When using OpenCL in OpenCV 3.1 (so using UMat instead of Mat), I observe significant latency in the first call of a function.

The project I am working on has some performance demands and will be ran in independent processes on the same machine with the same filters/settings (only the input image will change).

I am using OpenCV 3.1 in VS2015.

For example:

UMat input, output;
input = UMat::zeros(2048, 2048, CV_8UC1);
randn(input, 0, 3);
for (int i = 0; i < 10; i++)
    chrono::high_resolution_clock::time_point begin = chrono::high_resolution_clock::now();
    GaussianBlur(input, output, Size(5, 5), 1.5, 1.5);
    chrono::high_resolution_clock::time_point end = chrono::high_resolution_clock::now();
    long long ms = chrono::duration_cast<chrono::milliseconds>(end - begin).count();
    cout << ms << " ms" << endl;

This will give the following result:

144 ms
0 ms
0 ms
0 ms
0 ms
0 ms
0 ms
0 ms
0 ms
0 ms

I checked whether OpenCL was enabled using ocl::haveOpenCL() and explicitly enabeling it with ocl::setUseOpenCL(true) does not make a difference.

This issue was already addressed in this question and this one but not yet solved.

One of the causes of this latency could be the initialization of the OpenCL runtime at the first call. Is it possible to save the initialized state of OpenCL to file at the first run of my program and load it for consecutive runs? I know the ocl-module has classes like Program, ProgramSource and Context. However, I am not familiar with OpenCL in general and the documentations on these classes and how to use its members is nowhere to be found.

edit retag flag offensive close merge delete


My hardware: i7-4710MQ, 8GB ram, Intel HD Graphics 4600 & AMD Firepro M6100, Win10 x64

Schutze gravatar imageSchutze ( 2016-10-06 04:13:11 -0500 )edit

Can you use multithread instead of independant process?

LBerger gravatar imageLBerger ( 2016-10-06 04:58:18 -0500 )edit

Unfortunately not, the data will arrive irregularly and the program will be called on demand

Schutze gravatar imageSchutze ( 2016-10-06 07:11:41 -0500 )edit

I don't know if it is still a good answer

LBerger gravatar imageLBerger ( 2016-10-06 08:23:12 -0500 )edit

That answer is about sharing memory (image data) between process which both utilize the same GPU. For me, I only need to utilize the same OpenCL context, not the memory.

Schutze gravatar imageSchutze ( 2016-10-06 09:16:19 -0500 )edit

As far as I know, OpenCL code will be build during runtime (at the first function call). So I don't think, that it can be stored in a file. When you call cv::getBuildInformation() there is a line where OpenCL is declared as "Dynamic loading of OpenCL library". Perhabs it can be build static, but than you have the problem that the code won't run with different devices. But I have never been concerned with it, so this is just half-knowledge on my side.

matman gravatar imagematman ( 2016-10-06 12:00:29 -0500 )edit

The answer of @matman is partially pointing to the issue. When having OpenCL enabling code, your OpenCL device needs an initialization step. This step is done the first time an OpenCL command is given and cannot be avoided. That is why most applications run a OpenCL code at startup and then start their actual application.

StevenPuttemans gravatar imageStevenPuttemans ( 2016-10-07 03:55:36 -0500 )edit

What about running your program as a long-living server process?

mshabunin gravatar imagemshabunin ( 2016-10-07 04:07:06 -0500 )edit