CUDA and OpenCV performance
Hello,
I have a quite big project with several image processing parts implemented with OpenCV 3. In general, I am noticing that the CPU seems to be faster in terms of speed then the part programmed with cv::cuda functions. For example, considering the two portions of code:
cv::GaussianBlur( image, image, cv::Size(3,3), 0,0);
and
cv::cuda::GpuMat cuda_image;
cuda_image.upload(image);
cv::Ptr<cv::cuda::Filter> filter = cv::cuda::createGaussianFilter(cuda_image.type(),
cuda_image.type(), cv::Size(3,3), 0, 0);
filter->apply(cuda_image, cuda_image);
image = cv::Mat(cuda_image);
it happens that the second one is much slower. Please note that I put this portion in a long loop before taking average time (ignoring the first iteration, even slower).
I understand that in this particular case the overhead in communication could be bigger than the effective computation time (image in this example is 1280X720), but it happens, in general, for each function cv::cuda that I use, even things like solvePnPRansac that does not process directly images. Note that I tested the solution in a workstation with Quadro M1200 (with a powerful CPU), but the same occurs with a Tegra TX2, where CPU capabilities are very limited. Until now, the big improvement has been obtained by using OpenMP (of course, mostly on the workstation).
Considering these results, I am having some doubt about how OpenCV compiled with CUDA support works. Could it be possible that since I compiled OpenCV with CUDA support, calling methods like cv::GaussianBlur automatically imply calling the GPU, in a more efficient way than my second portion of code posted? In this case, which are the guidelines and best practices? Or should I consider this just an overhead problem?
Thank you
cuda will work fine, if you can keep your data on the gpu as long as possible.
if you only have a single operation, the costs for up & downloading will outweight any profits.
so, what are your "other" operations ? you'd want to upload your image once, then use cuda processing all the way down, and only download it again at the end of the pipeline
Yes basically the difference between gpu and cpu.
I understand, it is as I suspected. Unfortunately many other operations are not concerning pure image processing. Focusing on the image processing part, sometime it also happens that I cannot make usage of cv::cuda since there isn't such equivalent method implementation available (like cv::findContours, cv::text::ERFilter, cv::text::erGrouping and so on, implying that I should download and upload several times the image). My doubt is now concerning what happens if I compile OpenCV with CUDA support and then I don't call cv::cuda methods. Are these functions executed always and purely on the CPU, 100% of the times, or it can happen sometimes that OpenCV automatically send to the GPU, because of the internal implementation, of course depending on the function in exam?
I hope it will fall back to using open cl(open cl also includes gpu support) in this case - but maybe @berak or someone else with better knowledge of the source can comment on this?
Yes, it would be great to receive such information. An automatic call to OpenCL or CUDA implementation in that case would explain everything.