Android OpenCL DFT vs. CPP Version is very slow! [closed]
Hey, I've started to learn a bit about Android GPU programming and wanted to implement the DFT with the new T-API in OpenCV 3.0. My Device is a Sony XPERIA Z1 which runs with OpenCL 1.1 (on Lollipop - hope that doesnt cause problems? Khronos website says, that Adreno 330 supports KitKat)
When comparing the two codes, the GPU-Version takes ~3200ms and the CPU-Version ~2800 ms .. What could be the issue? Any ideas?
UPDATE
I've changed the code to something easier:
UMat uIn, uOut, uTmp, uEdges, uBlur;
Mat input = imread( path+filename, IMREAD_GRAYSCALE );//.getUMat( ACCESS_FAST );
input.copyTo(uIn);
startTimer=clock();
GaussianBlur(uIn, uBlur, Size(1, 1), 1.5, 1.5);
Canny(uBlur, uEdges, 0, 30, 3);
stopTimer=clock();
imwrite(path+filename_result, uEdges);
cv::ocl::finish();
double elapse = 1000.0* (double)(stopTimer - startTimer)/(double)CLOCKS_PER_SEC;
Running the Code the first time is slowlier, than the second time, but takes exactly the same time than the CPU-implementation.
Any Ideas?
Sometime first call to opencl is very long because source code is not compiled. In your test can you measure time for a second call?
Yeah, I did that. It half the time and is then still longer than the CPU-Version - or even worth - gives me the error listed above. I think I need to deallocate something?
OpenCV Error: Assertion failed (u->refcount == 0 || u->tempUMat()) in virtual void cv::ocl::OpenCLAllocator::upload(..
BTW, I was able to compile the Tutorial and T-API in general works quiet well. Is it an error with imread/imwrite? Using Frames generated by camera from GLTexture works quiet well. Look hereI don't think that's a problem but may be you can use imread( path+filename, IMREAD_GRAYSCALE ).getUMat( ACCESS_READ ); to read your image
Thank you for your comment, but still nothing has changed. Would it mak sense to read the Image as OpenGL texture and process it like the Tutorial 4 suggests? I need to say, that the Tutorial, with the OpenGL works perfectly fine and really fast, but loading the images with
´imread()´
simply doesn't work. I don't know what I'm missing..I've changed the code to some easier algorithm
Gauss
andCanny
comparison with UMat and Mat programm. I've figured out, that the first time running the code is slowlier than the second time. What could be the reason for that? Also the computational time is still (exactly) the same compared to CPU. Can I somehow figure out if the code runs on the GPU on the smartphone? The GPU works, which is tested with the Tutorial.. Any Ideas?be careful with OpenCL Canny, it is buggy at least ~one year
Ok. You're right. I've changed canny to
Laplacian
and running the code a second time, changes its computational time from ~400ms to ~20 ms which is really impressive! Too bad, that it only works with this algorithm. Do you think the OCL-Version of the algorithms (i.e. DFT, Canny, etc) will become stable in about a ~year?Another Question: Why is the first "run" still slow (twice as long as CPU-version) and only the second "run" actually speeds it up? Is it because the kernel needs to be created on the GPU-side? Do you have any good ressources for that? And also a list with "stable" functions? Thank you very much!
thanks @drVit about this bug. About OCL first time I don't know resources about first time but I understand like this: many platform exist for GPU so it's difficult to produce object code for all existing platform. It is then necessary to compile code before running it.