cv::cuda::setDevice() takes forever, how can I speed it up?

takes a whopping 48.48 seconds on my machine. This seems way too slow. Any idea what's wrong, and how I can speed it up?

GPU info:

Device 0: "GeForce GTX 1060 6GB"
  CUDA Driver Version / Runtime Version          8.0 / 7.5
  CUDA Capability Major/Minor version number:    6.1

Relevant CMAKE flags when I compiled OpenCV:


While googling the issue, I read the issue might be due to JIT compiling, and that I should compile the binaries with specific compute flags. However I'm not exactly sure what this means...are there some flags I should add when compiling my program, or do I need to somehow reinstall CUDA with more architectures enabled?

