Hi,
I am trying to process the two images from a stereo camera from a video that I recorded in a loop, so for every frame recorded I am doing the same operations. The program is running on an Nvidia Jetson Nano, and to speed it up I want to use CUDA to run the operations on the GPU. The image size is 2208x1242 with 4 channels.
To run the morphological operations on the GPU, I used the following code:
morph_filter_open = cv::cuda::createMorphologyFilter(cv::MORPH_OPEN, img_type, open_kernel);
morph_filter_close = cv::cuda::createMorphologyFilter(cv::MORPH_CLOSE, img_type, close_kernel);
and
void Morphology::open(cv::cuda::GpuMat img, cv::cuda::GpuMat out){
morph_filter_open->apply(img, out);
};
void Morphology::close(cv::cuda::GpuMat img, cv::cuda::GpuMat out){
morph_filter_close->apply(img, out);
};
The kernel is a standard cv::Mat
, img_type is just an int
with value 0.
The functions are called like this:
img_left_gpu.upload(img_left);
img_right_gpu.upload(img_right);
start = std::chrono::high_resolution_clock::now();
morphology.open(img_left_gpu, img_left_gpu);
morphology.open(img_right_gpu, img_right_gpu);
morphology.close(img_left_gpu, img_left_gpu);
morphology.close(img_right_gpu, img_right_gpu);
finish = std::chrono::high_resolution_clock::now();
Opening and closing on the GPU takes about 1.5s for both images, whereas the same operation with cv::morphologyEx
on the CPU only take about 0.07s.
As you see, I upload the images to the GPU before starting the timer, so my understanding it that the copy-operation, although it also may take relatively long, cannot be the problem here, or am I wrong?
Thank you for your help!