Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Why are large kernerls on CUDA slow?

I have modern *buntu with stock drivers.

My hardware is i7-7700 and Nvidia 1080 oc.

I tried OpenCV 3.4.3 and 4.0.0.

I am creating my filter like that:

 cv::Mat k = KernelMaker::muKernel3(p, a); // 100x100
 filter = cv::cuda::createLinearFilter(
    CV_32F, CV_32F, k, cv::Point(-1, -1), cv::BORDER_REFLECT));

and later using like that:

cv::cuda::GpuMat gbw; //20 megapixels
cv::cuda::GpuMat stmp(gbw.size(), CV_32FC1, cv::Scalar(0));
filter->apply(gbw, stmp, stream);

But its many times slower than CPU implementation with Filter2D.

If I run it with a small 3x3 dummy kernel then GPU implementation gets much faster. So it's not memory copy or anything like that i think.

Can y get my large kernel to run faster?

Why are large kernerls on CUDA slow?

I have modern *buntu with stock drivers.

My hardware is i7-7700 and Nvidia 1080 oc.

I tried OpenCV 3.4.3 and 4.0.0.

I am creating my filter like that:

 cv::Mat k = KernelMaker::muKernel3(p, a); // 100x100
 filter = cv::cuda::createLinearFilter(
    CV_32F, CV_32F, k, cv::Point(-1, -1), cv::BORDER_REFLECT));

and later using like that:

cv::cuda::GpuMat gbw; //20 megapixels
cv::cuda::GpuMat stmp(gbw.size(), CV_32FC1, cv::Scalar(0));
filter->apply(gbw, stmp, stream);

But its many times slower than CPU implementation with Filter2D.

If I run it with a small 3x3 dummy kernel then GPU implementation gets much faster. So it's not memory copy or anything like that i think.

Can y get my large kernel to run faster?