Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Cuda Convolve VS filter2D openCV 3.1.0

Hello, I'm using OpenCV 3.1.0 with CUDA on Intel Xeon [email protected] Ghz x2 CPU + Nvidia Quadro 600 + 4GB RAM with Qt on Fedora 23 OS and I'm concerned about convolution speed. What I've got from my test code is that filter2D convolution of an image with a 3x3 kernel is much faster than cuda Convolve as far as the image size is not too big (threshold around 1280x1024) and surprisingly always faster than separate convolution (first with 3x1 then 1x3 kernels), I was expecting from theory 2/3 processing time (3+3 rather than 3x3). Moreover the output image size with cuda convolve is smaller than the original one, I was expecting same size from documentation.

Is there anything wrong in what I'm doing? Any suggestion to speed up convolution for images around 640x480? You can find below the test code I used:

cv::cuda::GpuMat temp2; // ---- is a B/W image different size

//-----fill up the temp2 image ....
//--------------------------- Mat dst_x; Mat dst_x1; Mat dst_x2; Mat tmp_2; cv::cuda::GpuMat fx;

Mat kernel_x = (Mat_<double>(3,3) << 2, 0, -2, 4, 0, -4, 2, 0, -2);

Mat kernel_x1 = (Mat_<double>(3,1) << 2, 4, 2); //----separate x convolution Mat kernel_x2 = (Mat_<double>(1,3) << 1, 0, -1); temp2.download(tmp_2);

int64 t1 = getTickCount();

cv::filter2D(tmp_2, dst_x1, -1,kernel_x1); cv::filter2D(dst_x1, dst_x2, -1,kernel_x2); int64 t2 = getTick(); std::cout << "Times passed in ms: " << (((t2 - t1) / 1e9)*1000.) << std::endl;

int64 t1 = getTickCount();

cv::filter2D(tmp_2, dst_x, -1,kernel_x); int64 t2 = getTick(); std::cout << "Times passed in ms: " << (((t2 - t1) / 1e9)*1000.) << std::endl;

//----CUDA convolution---------

kernel_x.convertTo(kernel_x,CV_32FC1);

int64 t1 = getTickCount();

Ptr<cuda::convolution> convolver = cuda::createConvolution(Size(3, 3)); convolver->convolve(temp2, kernel_x, fx);

int64 t2 = getTick(); std::cout << "Time passed in ms: " << (((t2 - t1) / 1e9)*1000.) << std::endl; //----END CUDA convolution---------

I can sum up the results as follows:

Image size (30,40) (rows,cols) Time passed in ms: 0.083827filter2D convolution with kernel size (3,3)output image same size Time passed in ms: 0.044761filter2D separated convolution with kernel size (1,3) and (3,1)output image same size Time passed in ms: 5.95849CUDA convolve convolution with kernel size (3,3)output image size (28,38);

Image size (118,158) Time passed in ms: 0.204968filter2D convolution with kernel size (3,3)output image same size Time passed in ms: 0.27658filter2D separated convolution with kernel size (1,3) and (3,1)output image same size Time passed in ms: 7.03869CUDA convolve convolution with kernel size (3,3)output image size (116,156);

Image size (469,629) Time passed in ms: 2.51682filter2D convolution with kernel size (3,3)output image same size Time passed in ms: 5.72645filter2D separated convolution with kernel size (1,3) and (3,1)output image same size Time passed in ms: 9.31991CUDA convolve convolution with kernel size (3,3)output image size (467,627);

Cuda Convolve VS filter2D openCV 3.1.0

Hello, I'm using OpenCV 3.1.0 with CUDA on Intel Xeon [email protected] 5110 @ 1.60 Ghz x2 CPU + Nvidia Quadro 600 + 4GB RAM with Qt on Fedora 23 OS and I'm concerned about convolution speed. What I've got from my test code is that filter2D convolution of an image with a 3x3 kernel is much faster than cuda Convolve as far as the image size is not too big (threshold around 1280x1024) and surprisingly always faster than separate convolution (first with 3x1 then 1x3 kernels), I was expecting from theory 2/3 processing time (3+3 rather than 3x3). Moreover the output image size with cuda convolve is smaller than the original one, I was expecting same size from documentation.

Is there anything wrong in what I'm doing? Any suggestion to speed up convolution for images around 640x480? You can find below the test code I used:

cv::cuda::GpuMat temp2; // ---- is a B/W image different size

//-----fill up the temp2 image image

....
//---------------------------

//---------------------------

Mat dst_x; Mat dst_x1; Mat dst_x2; Mat tmp_2; cv::cuda::GpuMat fx;

Mat kernel_x = (Mat_<double>(3,3) << 2, 0, -2, 4, 0, -4, 2, 0, -2);

Mat kernel_x1 = (Mat_<double>(3,1) << 2, 4, 2); //----separate x convolution convolution

Mat kernel_x2 = (Mat_<double>(1,3) << 1, 0, -1); -1);

temp2.download(tmp_2);

int64 t1 = getTickCount();

cv::filter2D(tmp_2, dst_x1, -1,kernel_x1); -1,kernel_x1);

cv::filter2D(dst_x1, dst_x2, -1,kernel_x2); -1,kernel_x2);

int64 t2 = getTick(); getTick();

std::cout << "Times "Time passed in ms: " << (((t2 - t1) / 1e9)*1000.) << std::endl;

int64 //int64 t1 = getTickCount();

cv::filter2D(tmp_2, dst_x, -1,kernel_x); int64 -1,kernel_x);

//int64 t2 = getTick(); std::cout << "Times getTick();

//std::cout << "Time passed in ms: " << (((t2 - t1) / 1e9)*1000.) << std::endl;

//----CUDA convolution---------

kernel_x.convertTo(kernel_x,CV_32FC1);

int64 //int64 t1 = getTickCount();

Ptr<cuda::convolution> convolver = cuda::createConvolution(Size(3, 3)); 3));

convolver->convolve(temp2, kernel_x, fx);

int64 //int64 t2 = getTick(); std::cout getTick();

//std::cout << "Time passed in ms: " << (((t2 - t1) / 1e9)*1000.) << std::endl; std::endl;

//----END CUDA convolution---------

I can sum up the results as follows:

Image size (30,40) (rows,cols) Time passed in ms: 0.083827filter2D convolution with kernel size (3,3)output image same size Time passed in ms: 0.044761filter2D separated convolution with kernel size (1,3) and (3,1)output image same size Time passed in ms: 5.95849CUDA convolve convolution with kernel size (3,3)output image size (28,38);

Image size (118,158) Time passed in ms: 0.204968filter2D convolution with kernel size (3,3)output image same size Time passed in ms: 0.27658filter2D separated convolution with kernel size (1,3) and (3,1)output image same size Time passed in ms: 7.03869CUDA convolve convolution with kernel size (3,3)output image size (116,156);

Image size (469,629) Time passed in ms: 2.51682filter2D convolution with kernel size (3,3)output image same size Time passed in ms: 5.72645filter2D separated convolution with kernel size (1,3) and (3,1)output image same size Time passed in ms: 9.31991CUDA convolve convolution with kernel size (3,3)output image size (467,627);

Cuda Convolve VS filter2D openCV 3.1.0

Hello, I'm using OpenCV 3.1.0 with CUDA on Intel Xeon 5110 @ 1.60 Ghz x2 CPU + Nvidia Quadro 600 + 4GB RAM with Qt on Fedora 23 OS and I'm concerned about convolution speed. speed.

What I've got from my test code is that filter2D convolution of an image with a 3x3 kernel is much faster than cuda Convolve as far as the image size is not too big (threshold around 1280x1024) and surprisingly always faster than separate convolution (first with 3x1 then 1x3 kernels), I was expecting from theory 2/3 processing time (3+3 rather than 3x3). Moreover the output image size with cuda convolve is smaller than the original one, I was expecting same size from documentation.

Is there anything wrong in what I'm doing? Any suggestion to speed up convolution for images around 640x480? You can find below the test code I used:

cv::cuda::GpuMat temp2; // ---- is a B/W image different size

size //-----fill up the temp2 image

image ....

//---------------------------

//--------------------------- Mat dst_x; Mat dst_x1; Mat dst_x2; Mat tmp_2; cv::cuda::GpuMat fx;

fx; Mat kernel_x = (Mat_<double>(3,3) << 2, 0, -2, 4, 0, -4, 2, 0, -2);

-2); Mat kernel_x1 = (Mat_<double>(3,1) << 2, 4, 2); //----separate x convolution

convolution Mat kernel_x2 = (Mat_<double>(1,3) << 1, 0, -1);

temp2.download(tmp_2);

-1); temp2.download(tmp_2); int64 t1 = getTickCount();

getTickCount(); cv::filter2D(tmp_2, dst_x1, -1,kernel_x1);

-1,kernel_x1); cv::filter2D(dst_x1, dst_x2, -1,kernel_x2);

-1,kernel_x2); int64 t2 = getTick();

getTick(); std::cout << "Time passed in ms: " << (((t2 - t1) / 1e9)*1000.) << std::endl;

std::endl; //int64 t1 = getTickCount();

getTickCount(); cv::filter2D(tmp_2, dst_x, -1,kernel_x);

-1,kernel_x); //int64 t2 = getTick();

getTick(); //std::cout << "Time passed in ms: " << (((t2 - t1) / 1e9)*1000.) << std::endl;

std::endl; //----CUDA convolution---------

kernel_x.convertTo(kernel_x,CV_32FC1);

convolution--------- kernel_x.convertTo(kernel_x,CV_32FC1); //int64 t1 = getTickCount();

Ptr<cuda::convolution> getTickCount(); Ptr<cuda::Convolution> convolver = cuda::createConvolution(Size(3, 3));

3)); convolver->convolve(temp2, kernel_x, fx);

fx); //int64 t2 = getTick();

getTick(); //std::cout << "Time passed in ms: " << (((t2 - t1) / 1e9)*1000.) << std::endl;

std::endl; //----END CUDA convolution---------

convolution---------

I can sum up the results as follows:

Image size (30,40) (rows,cols)
Time passed in ms: 0.083827filter2D convolution with kernel size (3,3)output image same size
Time passed in ms: 0.044761filter2D separated convolution with kernel size (1,3) and (3,1)output image same size
Time passed in ms: 5.95849CUDA convolve convolution with kernel size (3,3)output image size (28,38);

(28,38); Image size (118,158) Time passed in ms: 0.204968filter2D convolution with kernel size (3,3)output image same size Time passed in ms: 0.27658filter2D separated convolution with kernel size (1,3) and (3,1)output image same size Time passed in ms: 7.03869CUDA convolve convolution with kernel size (3,3)output image size (116,156);

(116,156); Image size (469,629) Time passed in ms: 2.51682filter2D convolution with kernel size (3,3)output image same size Time passed in ms: 5.72645filter2D separated convolution with kernel size (1,3) and (3,1)output image same size Time passed in ms: 9.31991CUDA convolve convolution with kernel size (3,3)output image size (467,627);

(467,627);

Cuda Convolve VS filter2D openCV 3.1.0

Hello, I'm using OpenCV 3.1.0 with CUDA on Intel Xeon 5110 @ 1.60 Ghz x2 CPU + Nvidia Quadro 600 + 4GB RAM with Qt on Fedora 23 OS and I'm concerned about convolution speed.

speed. What I've got from my test code is that filter2D convolution of an image with a 3x3 kernel is much faster than cuda Convolve as far as the image size is not too big (threshold around 1280x1024) and surprisingly always faster than separate convolution (first with 3x1 then 1x3 kernels), I was expecting from theory 2/3 processing time (3+3 rather than 3x3). Moreover the output image size with cuda convolve is smaller than the original one, I was expecting same size from documentation.

Is there anything wrong in what I'm doing? Any suggestion to speed up convolution for images around 640x480? You can find below the test code I used:

cv::cuda::GpuMat temp2; // ---- is a B/W image different size size

//-----fill up the temp2 image image

.... //---------------------------

//---------------------------

Mat dst_x; Mat dst_x1; Mat dst_x2; Mat tmp_2; cv::cuda::GpuMat fx; fx;

Mat kernel_x = (Mat_<double>(3,3) << 2, 0, -2, 4, 0, -4, 2, 0, -2); -2);

Mat kernel_x1 = (Mat_<double>(3,1) << 2, 4, 2); //----separate x convolution convolution

Mat kernel_x2 = (Mat_<double>(1,3) << 1, 0, -1); temp2.download(tmp_2); -1);

temp2.download(tmp_2);

int64 t1 = getTickCount(); getTickCount();

cv::filter2D(tmp_2, dst_x1, -1,kernel_x1); -1,kernel_x1);

cv::filter2D(dst_x1, dst_x2, -1,kernel_x2); -1,kernel_x2);

int64 t2 = getTick(); getTick();

std::cout << "Time passed in ms: " << (((t2 - t1) / 1e9)*1000.) << std::endl; std::endl;

//int64 t1 = getTickCount(); getTickCount();

cv::filter2D(tmp_2, dst_x, -1,kernel_x); -1,kernel_x);

//int64 t2 = getTick(); getTick();

//std::cout << "Time passed in ms: " << (((t2 - t1) / 1e9)*1000.) << std::endl; std::endl;

//----CUDA convolution--------- kernel_x.convertTo(kernel_x,CV_32FC1); convolution---------

kernel_x.convertTo(kernel_x,CV_32FC1);

//int64 t1 = getTickCount(); Ptr<cuda::Convolution> getTickCount();

Ptr<cuda::convolution> convolver = cuda::createConvolution(Size(3, 3)); 3));

convolver->convolve(temp2, kernel_x, fx); fx);

//int64 t2 = getTick(); getTick();

//std::cout << "Time passed in ms: " << (((t2 - t1) / 1e9)*1000.) << std::endl; std::endl;

//----END CUDA convolution---------

convolution---------

I can sum up the results as follows:

Image size (30,40) (rows,cols) (rows,cols)

Time passed in ms: 0.083827filter2D convolution with kernel size (3,3)output image same size size

Time passed in ms: 0.044761filter2D separated convolution with kernel size (1,3) and (3,1)output image same size size

Time passed in ms: 5.95849CUDA convolve convolution with kernel size (3,3)output image size (28,38); (28,38);

Image size (118,158) (118,158)

Time passed in ms: 0.204968filter2D convolution with kernel size (3,3)output image same size size

Time passed in ms: 0.27658filter2D separated convolution with kernel size (1,3) and (3,1)output image same size size

Time passed in ms: 7.03869CUDA convolve convolution with kernel size (3,3)output image size (116,156); (116,156);

Image size (469,629) (469,629)

Time passed in ms: 2.51682filter2D convolution with kernel size (3,3)output image same size size

Time passed in ms: 5.72645filter2D separated convolution with kernel size (1,3) and (3,1)output image same size size

Time passed in ms: 9.31991CUDA convolve convolution with kernel size (3,3)output image size (467,627);

(467,627);