The result obtained by cuda::dft is different from cv::dft

asked 2020-07-28 00:58:26 -0600

jianyn gravatar image

updated 2020-07-29 00:36:08 -0600

I'm tring to speed up the cv::dft by using the gpu version, but I find the result obtained by cv::cuda::dft is different from cv::dft.

Here's the code:

CPU version:

Mat t = imread(...) // read the src image
Mat f, dst;
Mat plane_h[] = { Mat_<float>(t), Mat::zeros(t.size(),CV_32F) };
merge(plane_h, 2, t);
merge(plane_h, 2, f);
cv::dft(t, f, DFT_SCALE | DFT_COMPLEX_OUTPUT);
cv::dft(f, dst, DFT_INVERSE | DFT_REAL_OUTPUT);

GPU vesion:

Mat t = imread(...); // read the src image
cuda::GpuMat t_dev, f_dev, dst_dev;
Mat dst;
t_dev.upload(t);
cuda::GpuMat plane_h[] = { t_dev, cuda::GpuMat(t_dev.size(),CV_32FC1) };
cuda::merge(plane_h, 2, t_dev);
cuda::merge(plane_h, 2, f_dev);
cuda::dft(t_dev, f_dev, t_dev.size(), DFT_SCALE);
cuda::dft(f_dev, dst_dev, t_dev.size(), DFT_COMPLEX_INPUT | DFT_REAL_OUTPUT);
dst_dev.download(dst);

in cpu version, 'dst' is equal to 't'; while in gpu version, 'dst' was totally wrong.

I also found the the 'f_dev' in gpu vetsion is equal to 'f' in cpu version.

edit retag flag offensive close merge delete

Comments

Which version of OpenCV and CUDA are you using? Additionally I cannot get your code to compile, can you post the exact code you are using, i.e. GPU::GpuMat should be cuda::GpuMat, do you use DFT_SCALE | DFT_COMPLEX_OUTPUT in the cuda version as well? Do the inbuilt accuracy tests pass?

cudawarped gravatar imagecudawarped ( 2020-07-28 03:09:05 -0600 )edit

opencv version = 4.1.1; cuda version = 10.1. GPU::GpuMat is a misspelling here, cuda::GpuMat is right. DFT_COMPLEX_OUTPUT is not support in cuda version, but as I said, I can get the same results in FFT forward transform both in cpu version and cuda version, but different results in inverse FFT transform.

jianyn gravatar imagejianyn ( 2020-07-29 01:55:12 -0600 )edit

Hi, I still can't get your code to run. Can you post the exact code which you are using so I can see there is an issue or if the difference you are observing is due to an error or the BLAS implementations. i.e. is t uint8 or f32, above it looks to be uint8 but you are trying to combine it with an f32.

cudawarped gravatar imagecudawarped ( 2020-07-29 04:07:00 -0600 )edit

Hey, I'v solved this issue. In gpu version, the input data should be real not complex when performing FFT forward transformation, then the same result will be obtained after the inverse trans. This should have something to do with cufft lib in cuda.

jianyn gravatar imagejianyn ( 2020-07-30 21:55:33 -0600 )edit