Revision history - OpenCV Q&A Forum

The labelComponents algorithm is legacy and uses Nvidia's NPP library, graphcut api. Unfortunately this was removed by Nvidia in CUDA 8.0.

If you read the git log you for modules\cudalegacy\src\graphcuts.cpp you can see that it was moved from cuda to cudalegacy in Jan 2015, probably because of the removal of the above api.

If you look at the source for labelComponents, line 46 you can see that only the headers are included for versions of CUDA greater than or equal to 8, to allow OpenCv to compile with the legacy modules selected.

The labelComponents algorithm is legacy and uses Nvidia's NPP library, graphcut api. Unfortunately this was removed by Nvidia in CUDA 8.0.

If you read the git log you for modules\cudalegacy\src\graphcuts.cpp you can see that it was moved from cuda to cudalegacy in Jan 2015, probably because of the removal of the above api.

If you look at the source for labelComponents, line 46 you can see that only the headers are included for versions of CUDA greater than or equal to 8, to allow OpenCv to compile with the legacy modules selected.

Because you are 100% that you need to process on the GPU I dug out some timing results I had for a CUDA CCL implementation I worked on at a previous company. The comparison was connectedComponentsWithStats and cvFindContours run on a laptop i7 vs against our ccl implementation on a gtx 980m with CUDA 8.0 (no cooperative groups).

Because CCL algorithms on the GPU are iterative the execution time is highly dependent on the type of image you are looking at. For example our CUDA implementation was only slightly faster on the below 1080p maze image image description

cvFindContours: 16.3ms 
connectedComponentsWithStats: 4.5ms 
customCCL: 3.9ms

however on the below classic foreground image (enlarged to 960x2240 to saturate the GPU) image description

our implementation was over 2x faster

cvFindContours: 0.63ms
connectedComponentsWithStats: 3.2ms
customCCL: 1.4ms

however on this type of image you cvFindContours performs extremely efficiently

Our bespoke implementation was faster than ArrayFire's regions mainly because it required fewer iterations to converge.

Therefore if I were you I would try to process the types of images you will be using, possibly with ArrayFire's regions (if you don't want to install an older version of CUDA to use labelComponents) to get an idea of what the cost will be on the GPU before you completely write off transferring to the CPU, and running on the CPU in parallel with the GPU.

The labelComponents algorithm is legacy and uses Nvidia's NPP library, graphcut api. Unfortunately this was removed by Nvidia in CUDA 8.0.

If you read the git log you for modules\cudalegacy\src\graphcuts.cpp you can see that it was moved from cuda to cudalegacy in Jan 2015, probably because of the removal of the above api.

If you look at the source for labelComponents, line 46 you can see that only the headers are included for versions of CUDA greater than or equal to 8, to allow OpenCv to compile with the legacy modules selected.

Because you are 100% that you need to process on the GPU I dug out some timing results I had for a CUDA CCL implementation I worked on at a previous ~~company.~~ company, to give you an indication of the performance increase that is achievable when performing CCL on the GPU.

The comparison was connectedComponentsWithStats and cvFindContours run on a laptop i7 vs against our ccl implementation on a gtx 980m with CUDA 8.0 (no cooperative groups).

Because CCL algorithms on the GPU are iterative the execution time is highly dependent on the type of image you are looking at. For example our CUDA implementation was only slightly faster on the below 1080p maze ~~image~~ image

image description

cvFindContours: 16.3ms 
connectedComponentsWithStats: 4.5ms 
customCCL: 3.9ms

however on the below classic foreground image (enlarged to 960x2240 to saturate the ~~GPU)~~ GPU)

image description

our implementation was over 2x faster

cvFindContours: 0.63ms
connectedComponentsWithStats: 3.2ms
customCCL: 1.4ms

however on this type of image you cvFindContours performs extremely efficiently

Our ~~bespoke implementation was~~ implementation of the CCL algorithm was competitive being significantly faster than ArrayFire's regions , mainly because it required fewer iterations to converge.

Therefore if I were you I would try to process the types of images you will be using, possibly with ArrayFire's regions (if you don't want to install an older version of CUDA to use labelComponents) to get an idea of what the cost will be on the GPU before you completely write off transferring to the CPU, and running on the CPU in parallel with the GPU.

Revision history [back]