Android OpenCL DFT vs. CPP Version is very slow! [closed]

asked 2015-11-08 10:31:58 -0500

beniroquai gravatar image

updated 2015-11-09 16:42:57 -0500

Hey, I've started to learn a bit about Android GPU programming and wanted to implement the DFT with the new T-API in OpenCV 3.0. My Device is a Sony XPERIA Z1 which runs with OpenCL 1.1 (on Lollipop - hope that doesnt cause problems? Khronos website says, that Adreno 330 supports KitKat)

When comparing the two codes, the GPU-Version takes ~3200ms and the CPU-Version ~2800 ms .. What could be the issue? Any ideas?


I've changed the code to something easier:

UMat uIn, uOut, uTmp, uEdges, uBlur;
Mat input = imread( path+filename, IMREAD_GRAYSCALE );//.getUMat( ACCESS_FAST );

GaussianBlur(uIn, uBlur, Size(1, 1), 1.5, 1.5);
Canny(uBlur, uEdges, 0, 30, 3);
imwrite(path+filename_result, uEdges);

double elapse = 1000.0* (double)(stopTimer - startTimer)/(double)CLOCKS_PER_SEC;

Running the Code the first time is slowlier, than the second time, but takes exactly the same time than the CPU-implementation.

Any Ideas?

edit retag flag offensive reopen merge delete

Closed for the following reason the question is answered, right answer was accepted by sturkmen
close date 2020-09-27 08:57:05.618527


Sometime first call to opencl is very long because source code is not compiled. In your test can you measure time for a second call?

LBerger gravatar imageLBerger ( 2015-11-08 15:24:15 -0500 )edit

Yeah, I did that. It half the time and is then still longer than the CPU-Version - or even worth - gives me the error listed above. I think I need to deallocate something? OpenCV Error: Assertion failed (u->refcount == 0 || u->tempUMat()) in virtual void cv::ocl::OpenCLAllocator::upload(.. BTW, I was able to compile the Tutorial and T-API in general works quiet well. Is it an error with imread/imwrite? Using Frames generated by camera from GLTexture works quiet well. Look here

beniroquai gravatar imagebeniroquai ( 2015-11-09 00:24:59 -0500 )edit

I don't think that's a problem but may be you can use imread( path+filename, IMREAD_GRAYSCALE ).getUMat( ACCESS_READ ); to read your image

LBerger gravatar imageLBerger ( 2015-11-09 01:11:56 -0500 )edit

Thank you for your comment, but still nothing has changed. Would it mak sense to read the Image as OpenGL texture and process it like the Tutorial 4 suggests? I need to say, that the Tutorial, with the OpenGL works perfectly fine and really fast, but loading the images with ┬┤imread()┬┤ simply doesn't work. I don't know what I'm missing..

beniroquai gravatar imagebeniroquai ( 2015-11-09 13:56:10 -0500 )edit

I've changed the code to some easier algorithm Gauss and Canny comparison with UMat and Mat programm. I've figured out, that the first time running the code is slowlier than the second time. What could be the reason for that? Also the computational time is still (exactly) the same compared to CPU. Can I somehow figure out if the code runs on the GPU on the smartphone? The GPU works, which is tested with the Tutorial.. Any Ideas?

beniroquai gravatar imagebeniroquai ( 2015-11-09 16:38:49 -0500 )edit

be careful with OpenCL Canny, it is buggy at least ~one year

drVit gravatar imagedrVit ( 2015-11-09 17:39:11 -0500 )edit

Ok. You're right. I've changed canny to Laplacian and running the code a second time, changes its computational time from ~400ms to ~20 ms which is really impressive! Too bad, that it only works with this algorithm. Do you think the OCL-Version of the algorithms (i.e. DFT, Canny, etc) will become stable in about a ~year?

Another Question: Why is the first "run" still slow (twice as long as CPU-version) and only the second "run" actually speeds it up? Is it because the kernel needs to be created on the GPU-side? Do you have any good ressources for that? And also a list with "stable" functions? Thank you very much!

beniroquai gravatar imagebeniroquai ( 2015-11-10 00:28:38 -0500 )edit

thanks @drVit about this bug. About OCL first time I don't know resources about first time but I understand like this: many platform exist for GPU so it's difficult to produce object code for all existing platform. It is then necessary to compile code before running it.

LBerger gravatar imageLBerger ( 2015-11-10 01:17:08 -0500 )edit