Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

GPU Cuda initialization much slower with opencv libraries

Hello all,

Prereqs for posting, my environment: Linux x86 64, OpenCV 2.4.6.1, CUDA 5.0, Tesla Kepler K20c GPU

I've got a simple C++ application to benchmark cuda performance. It makes and times the following calls once each in order:

cudaSetDevice(0);
cudaMalloc(&someMemory, sizeof(float)*1024*1024);
cudaFree(someMemory);
cudaDeviceReset();

With just the cuda libraries linking, this takes ~10s of milliseconds for each call except for the malloc, which is about 0.25 seconds. Fine...no biggie, it's all part of GPU startup costs.

Here's the weird part - if I include libopencv_gpu.so and libopencv_core.so in the linker list (-lopencv_gpu -lopencv_core), without changing code whatsoever, those timings go through the roof. The cudaSetDevice call takes ~2.5 seconds, and the malloc takes ~5 seconds. Calls after that seem to be just as fast, but a ~7.5 second startup cost is ridiculous considering it's only ~.5 seconds without the opencv libraries.

Another oddity, taking out libopencv_gpu and just leaving the core library still has an effect: the set device call still takes ~2.5 seconds, and the malloc takes ~.7 seconds. What gives?

This affects more than my benchmark app, and it is repeatable. Does anyone have any insight on how opencv is destroying my startup performance? I tried setting CUDA_DEVCODE_PATH to /tmp/devcode, thinking it was PTX compilations, but nothing was made in the directory - am I using it wrong?

Any help would be great. Thanks!

click to hide/show revision 2
miswrote environment variable

GPU Cuda initialization much slower with opencv libraries

Hello all,

Prereqs for posting, my environment: Linux x86 64, OpenCV 2.4.6.1, CUDA 5.0, Tesla Kepler K20c GPU

I've got a simple C++ application to benchmark cuda performance. It makes and times the following calls once each in order:

cudaSetDevice(0);
cudaMalloc(&someMemory, sizeof(float)*1024*1024);
cudaFree(someMemory);
cudaDeviceReset();

With just the cuda libraries linking, this takes ~10s of milliseconds for each call except for the malloc, which is about 0.25 seconds. Fine...no biggie, it's all part of GPU startup costs.

Here's the weird part - if I include libopencv_gpu.so and libopencv_core.so in the linker list (-lopencv_gpu -lopencv_core), without changing code whatsoever, those timings go through the roof. The cudaSetDevice call takes ~2.5 seconds, and the malloc takes ~5 seconds. Calls after that seem to be just as fast, but a ~7.5 second startup cost is ridiculous considering it's only ~.5 seconds without the opencv libraries.

Another oddity, taking out libopencv_gpu and just leaving the core library still has an effect: the set device call still takes ~2.5 seconds, and the malloc takes ~.7 seconds. What gives?

This affects more than my benchmark app, and it is repeatable. Does anyone have any insight on how opencv is destroying my startup performance? I tried setting CUDA_DEVCODE_PATH CUDA_DEVCODE_CACHE to /tmp/devcode, thinking it was PTX compilations, but nothing was made in the directory - am I using it wrong?

Any help would be great. Thanks!