Hello,
I have a quite big project with several image processing parts implemented with OpenCV 3. In general, I am noticing that the CPU seems to be faster in terms of speed then the part programmed with cv::cuda functions. For example, considering the two portions of code:
cv::GaussianBlur( image, image, cv::Size(3,3), 0,0);
and
cv::cuda::GpuMat cuda_image;
cuda_image.upload(image);
cv::Ptr<cv::cuda::Filter> filter = cv::cuda::createGaussianFilter(cuda_image.type(),
cuda_image.type(), cv::Size(3,3), 0, 0);
filter->apply(cuda_image, cuda_image);
image = cv::Mat(cuda_image);
it happens that the second one is much slower. Please that I put this portion in a long loop for before taking average time (ignoring the first iteration, even slower).
I understand that in this particular case the overhead in communication could be bigger than the effective computation time (image in this example is 1280X720), but it happens, in general, for each function cv::cuda that I use, even things like solvePnPRansac that does not process directly images. Note that I tested the solution in a workstation with Quadro M1200 (with a powerful CPU), but the same occurs with a Tegra TX2, where CPU capabilities are very limited. Until now, the big improvement has been obtained by using OpenMP (of course, mostly on the workstation).
Considering these results, I am having some doubt about how OpenCV compiled with CUDA support works. Could it be possible that since I compiled OpenCV with CUDA support, calling methods like cv::GaussianBlur automatically imply calling the GPU, in a more efficient way than my second portion of code posted? In this case, which are the guidelines and best practices? Or should I consider this just an overhead problem?
Thank you