I have been told to implement a fast unsharp mask using the GPU which works like GIMP and supports large kernels (eg. 201x201). It seems that I am able to do this for smaller kernels using cv::cuda::createGaussianFilter(), applying the filter, and then doing a subtract and weighted add, but cv::cuda::createGaussianFilter() will not accept a size with a width or height greater than 32, which apparently has to do with a limitation with CUDA. Is there any reasonable way to do what I need with CUDA? I know very little about CUDA or image filtering, and I don't have time to learn all about them.
Thank You
Larry