Hello, I'm using OpenCV 3.1.0 with CUDA on Intel Xeon [email protected] Ghz x2 CPU + Nvidia Quadro 600 + 4GB RAM with Qt on Fedora 23 OS and I'm concerned about convolution speed. What I've got from my test code is that filter2D convolution of an image with a 3x3 kernel is much faster than cuda Convolve as far as the image size is not too big (threshold around 1280x1024) and surprisingly always faster than separate convolution (first with 3x1 then 1x3 kernels), I was expecting from theory 2/3 processing time (3+3 rather than 3x3). Moreover the output image size with cuda convolve is smaller than the original one, I was expecting same size from documentation.
Is there anything wrong in what I'm doing? Any suggestion to speed up convolution for images around 640x480? You can find below the test code I used:
cv::cuda::GpuMat temp2; // ---- is a B/W image different size
//-----fill up the temp2 image
....
//---------------------------
Mat dst_x;
Mat dst_x1;
Mat dst_x2;
Mat tmp_2;
cv::cuda::GpuMat fx;
Mat kernel_x = (Mat_<double>(3,3) << 2, 0, -2, 4, 0, -4, 2, 0, -2);
Mat kernel_x1 = (Mat_<double>(3,1) << 2, 4, 2); //----separate x convolution Mat kernel_x2 = (Mat_<double>(1,3) << 1, 0, -1); temp2.download(tmp_2);
int64 t1 = getTickCount();
cv::filter2D(tmp_2, dst_x1, -1,kernel_x1); cv::filter2D(dst_x1, dst_x2, -1,kernel_x2); int64 t2 = getTick(); std::cout << "Times passed in ms: " << (((t2 - t1) / 1e9)*1000.) << std::endl;
int64 t1 = getTickCount();
cv::filter2D(tmp_2, dst_x, -1,kernel_x); int64 t2 = getTick(); std::cout << "Times passed in ms: " << (((t2 - t1) / 1e9)*1000.) << std::endl;
//----CUDA convolution---------
kernel_x.convertTo(kernel_x,CV_32FC1);
int64 t1 = getTickCount();
Ptr<cuda::convolution> convolver = cuda::createConvolution(Size(3, 3)); convolver->convolve(temp2, kernel_x, fx);
int64 t2 = getTick(); std::cout << "Time passed in ms: " << (((t2 - t1) / 1e9)*1000.) << std::endl; //----END CUDA convolution---------
I can sum up the results as follows:
Image size (30,40) (rows,cols) Time passed in ms: 0.083827filter2D convolution with kernel size (3,3)output image same size Time passed in ms: 0.044761filter2D separated convolution with kernel size (1,3) and (3,1)output image same size Time passed in ms: 5.95849CUDA convolve convolution with kernel size (3,3)output image size (28,38);
Image size (118,158) Time passed in ms: 0.204968filter2D convolution with kernel size (3,3)output image same size Time passed in ms: 0.27658filter2D separated convolution with kernel size (1,3) and (3,1)output image same size Time passed in ms: 7.03869CUDA convolve convolution with kernel size (3,3)output image size (116,156);
Image size (469,629) Time passed in ms: 2.51682filter2D convolution with kernel size (3,3)output image same size Time passed in ms: 5.72645filter2D separated convolution with kernel size (1,3) and (3,1)output image same size Time passed in ms: 9.31991CUDA convolve convolution with kernel size (3,3)output image size (467,627);