Ask Your Question
0

How to run OpenCL file (example fast.cl) in OpenCV library?

asked 2017-03-19 22:20:02 -0600

minhntu gravatar image

Hi guys, Anyone knows how to run OpenCL source code in OpenCV library? I see that there are many kernels in a file. I am confused how to set up arguments to run it. Thank you very much for your help.

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
0

answered 2017-03-19 22:48:54 -0600

Tetragramm gravatar image

Take a look HERE for an explanation of the Transparent API.

edit flag offensive delete link more

Comments

I want to run OpenCL file (example fast.cl at opencv310\sources\modules\features2d\src\opencl\fast.cl) in OpenCV library. But I see in this file have many kernels. And each kernel has a different number of arguments. How can I set up arguments to run this file?

minhntu gravatar imageminhntu ( 2017-03-20 05:06:10 -0600 )edit

You don't need to. You just run normal fast with UMat as the arguments.

Tetragramm gravatar imageTetragramm ( 2017-03-20 17:52:11 -0600 )edit

it's mean when I change Mat by UMat. FAST function will be run on GPU. Do you know what fast.cl in OpenCV library use for?

minhntu gravatar imageminhntu ( 2017-03-21 01:31:35 -0600 )edit

I'm not entirely sure what you're asking. Here is where the fast.cl kernels are called from.

Again, you don't need to do anything but use UMat matrices and have compiled OpenCV with OpenCL support.

Tetragramm gravatar imageTetragramm ( 2017-03-21 17:40:50 -0600 )edit

It means I only need to change from

Mat img = imread("chessboard.jpg", IMREAD_UNCHANGED);
FAST(gray, keypointsCorners, thresholdCorner, true);

to

cv::ocl::Device(context.device(0)); 
UMat img, gray; 
imread("chessboard.jpg", 1).copyTo(img); 
FAST(gray, keypointsCorners, thresholdCorner, true);

to have OpenCL code and this code will run on GPU? And while FAST () function is implementing it will call fast.cl file inside?

minhntu gravatar imageminhntu ( 2017-03-21 20:50:56 -0600 )edit

Try to use the code tag in the editor to make it readable.

Yes, so long as your OpenCV was compiled with OpenCL support, then it will run on the gpu. Though do note that in the snippet here, you don't actually fill gray. I assume you just left that out for brevity.

Tetragramm gravatar imageTetragramm ( 2017-03-21 20:57:07 -0600 )edit

When I change Mat by UMat. My result is code ran on GPU. But it is slower than CPU. And when I change a name of "fast.cl" file in path: opencv310\sources\modules\features2d\src\opencl\fast.cl), Code still run normally. Whether fast.cl file was not called while the code is implementing. How do we know that when changing Mat by UMat, Code will run parallel on GPU?

minhntu gravatar imageminhntu ( 2017-03-22 04:39:54 -0600 )edit

Just to make sure, you are compiling the code? Not using the installer? The installer is not (I think) compiled with OpenCL, so you will see no benefit.

Tetragramm gravatar imageTetragramm ( 2017-03-22 21:10:34 -0600 )edit

When I run code below

        cout << context.ndevices() << " GPU devices are detected." << endl;
    for (int i = 0; i < context.ndevices(); i++)
    {
        cv::ocl::Device device = context.device(i);
        cout << "name                 : " << device.name() << endl;
        cout << "available            : " << device.available() << endl;
        cout << "imageSupport         : " << device.imageSupport() << endl;
        cout << "OpenCL_C_Version     : " << device.OpenCL_C_Version() << endl;
        cout << endl;
    }

It showed results: 1 GPU devices are detected name: Quadro K2000 available: 1 image surport: 1

OpenCL version: OpenCL C 1.2

And while Code is implementing. It took 11026 ms for MAT and 29340ms for UMAT in the same time. Do you know why it is?

minhntu gravatar imageminhntu ( 2017-03-23 01:56:34 -0600 )edit

How many iterations? The first iteration may be much slower because of initializing the context and memory. That looks like a decent amount though.

Also, that's a really old card. What CPU do you have? If it's anything recent, it'll be faster than the GPU. Not because of the processing, but just the memory transfer. If you keep the same data on the GPU and do lots of work on it, it's much better than transferring back and forth.

Tetragramm gravatar imageTetragramm ( 2017-03-23 19:17:02 -0600 )edit

I use while(1) to iterate infinitely. My specification PC:

- Processor: Intel Xeon CPU E5-1650 v2 @ 3.5GHz.
- Ram: 16 GB.
- GPU: NVIDIA Quadro K2000.

This is a video when I run my fast corner detector code on PC. I showed source code, time running, and performance of CPU and GPU. When I run this code on Odroid XU4 (Processor Samsung Quad 4 core A15 - 2GHZ and 4 core A7 - 1.3GHZ. Ram 2GB. GPU MALI T628). Use MAT is still slower than UMAT. Link: https://www.youtube.com/watch?v=PXKJMepjuHg&feature=youtu.be (https://www.youtube.com/watch?v=PXKJM...)

minhntu gravatar imageminhntu ( 2017-03-23 21:11:54 -0600 )edit

Umm, if you iterate infinitely, then how do your time measurements work? Use a fixed number of iterations to make sure the two tests are equal.

For example, on my machine, 10000 iterations of FAST takes 4.89s on CPU and 3.28s on GPU.

    Mat im = imread("result.png");
UMat im2;

cvtColor(im, im, COLOR_BGR2GRAY);
im.copyTo(im2);

vector<KeyPoint> kps;
FAST(im, kps, 20, true);

auto start = high_resolution_clock::now();
for (int i = 0; i < 10000; ++i)
{
    FAST(im, kps, 20, true);
}
auto stop = high_resolution_clock::now();
cout << "CPU code is " << duration_cast<nanoseconds>(stop - start).count()/1.0e9 << "\n\n";

start = (snip);
for (int i = 0; i < 10000; ++i)
{
    FAST(im2, kps, 20, true);
}
stop = (snip);
    (print GPU time)
Tetragramm gravatar imageTetragramm ( 2017-03-23 23:12:23 -0600 )edit

I understood what is a reason. Running code on GPU is faster than CPU when a parallel volume computation on GPU is enough big and total time transfer data from CPU to GPU, implement it on GPU and transfer data from CPU to GPU again is less than running time on CPU. However, I am wondering that whether processing transfer data from CPU to GPU is implemented at command "im.copyTo(im2)" or at command "Fast()"? And after the first iteration of Fast () function, output data is still on GPU or is transferred to CPU before beginning next iteration of Fast() function.

minhntu gravatar imageminhntu ( 2017-03-24 06:56:37 -0600 )edit

The image copy is in the copyTo function, but there is some memory transfer in FAST to bring the keypoints back from the GPU.

Tetragramm gravatar imageTetragramm ( 2017-03-24 18:39:11 -0600 )edit

When I run fast code using a camera, running time on CPU is less than GPU. I understand that we take one-time overhead in loading the memory for the GPU implementation for each frame. Can I check if subsequently (from the second frame onwards), this memory load can be done in parallel with the computation? to improve performance. Otherwise, there will be an additional overhead for each new image frame that we are processing.

minhntu gravatar imageminhntu ( 2017-03-26 20:41:33 -0600 )edit

That's certainly a thing to try.

Another thing is how large your image is. If it's small, the overhead of copying and launching the kernels will outweigh the benefits. Larger images suffer from this less.

Tetragramm gravatar imageTetragramm ( 2017-03-26 20:53:22 -0600 )edit

Can I check if subsequently (from the second frame onwards), this memory load can be done in parallel with the computation? to improve performance

minhntu gravatar imageminhntu ( 2017-03-26 21:25:06 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2017-03-19 22:20:02 -0600

Seen: 1,854 times

Last updated: Mar 19 '17