cv::cuda::setDevice() takes forever, how can I speed it up?
Running
cv::cuda::setDevice(0);
takes a whopping 48.48 seconds on my machine. This seems way too slow. Any idea what's wrong, and how I can speed it up?
GPU info:
Device 0: "GeForce GTX 1060 6GB"
CUDA Driver Version / Runtime Version 8.0 / 7.5
CUDA Capability Major/Minor version number: 6.1
Relevant CMAKE flags when I compiled OpenCV:
-D WITH_CUBLAS=1 \
-D ENABLE_FAST_MATH=1 \
-D CUDA_FAST_MATH=1 \
-D CUDA_ARCH_PTX=5.2 \
While googling the issue, I read the issue might be due to JIT compiling, and that I should compile the binaries with specific compute flags. However I'm not exactly sure what this means...are there some flags I should add when compiling my program, or do I need to somehow reinstall CUDA with more architectures enabled?