# Revision history [back]

### cv::cuda morphology much slower than cv::morphologyEx

Hi,

I am trying to process the two images from a stereo camera from a video that I recorded in a loop, so for every frame recorded I am doing the same operations. The program is running on an Nvidia Jetson Nano, and to speed it up I want to use CUDA to run the operations on the GPU. The image size is 2208x1242 with 4 channels.

To run the morphological operations on the GPU, I used the following code:

morph_filter_open = cv::cuda::createMorphologyFilter(cv::MORPH_OPEN, img_type, open_kernel);
morph_filter_close = cv::cuda::createMorphologyFilter(cv::MORPH_CLOSE, img_type, close_kernel);


and

void Morphology::open(cv::cuda::GpuMat img, cv::cuda::GpuMat out){
morph_filter_open->apply(img, out);
};

void Morphology::close(cv::cuda::GpuMat img, cv::cuda::GpuMat out){
morph_filter_close->apply(img, out);
};


The kernel is a standard cv::Mat, img_type is just an int with value 0. The functions are called like this:

img_left_gpu.upload(img_left);

start = std::chrono::high_resolution_clock::now();
morphology.open(img_left_gpu, img_left_gpu);
morphology.open(img_right_gpu, img_right_gpu);
morphology.close(img_left_gpu, img_left_gpu);
morphology.close(img_right_gpu, img_right_gpu);
finish = std::chrono::high_resolution_clock::now();


Opening and closing on the GPU takes about 1.5s for both images, whereas the same operation with cv::morphologyEx on the CPU only take about 0.07s.

As you see, I upload the images to the GPU before starting the timer, so my understanding it that the copy-operation, although it also may take relatively long, cannot be the problem here, or am I wrong?

### cv::cuda morphology much slower than cv::morphologyEx

Hi,

I am trying to process the two images from a stereo camera from a video that I recorded in a loop, so for every frame recorded I am doing the same operations. The program is running on an Nvidia Jetson Nano, and to speed it up I want to use CUDA to run the operations on the GPU. The image size is 2208x1242 with 4 channels.

To run the morphological operations on the GPU, I used the following code:

morph_filter_open = cv::cuda::createMorphologyFilter(cv::MORPH_OPEN, img_type, open_kernel);
morph_filter_close = cv::cuda::createMorphologyFilter(cv::MORPH_CLOSE, img_type, close_kernel);


and

void Morphology::open(cv::cuda::GpuMat img, cv::cuda::GpuMat out){
morph_filter_open->apply(img, out);
};

void Morphology::close(cv::cuda::GpuMat img, cv::cuda::GpuMat out){
morph_filter_close->apply(img, out);
};


The kernel is a standard cv::Mat, img_type is just an int with value 0. The functions are called like this:

img_left_gpu.upload(img_left);

start = std::chrono::high_resolution_clock::now();
morphology.open(img_left_gpu, img_left_gpu);
morphology.open(img_right_gpu, img_right_gpu);
morphology.close(img_left_gpu, img_left_gpu);
morphology.close(img_right_gpu, img_right_gpu);
finish = std::chrono::high_resolution_clock::now();


Opening and closing on the GPU takes about 1.5s for both images, whereas the same operation with cv::morphologyEx on the CPU only take about 0.07s.

As you see, I upload the images to the GPU before starting the timer, so my understanding it that the copy-operation, although it also may take relatively long, cannot be the problem here, or am I wrong?