My multithreaded application (using C++, OpenCV 2.4.10, CUDA 5.2) crashes frequently, the debugger hangs always at a "cudaSafeCall( cudaDeviceSynchronize() );" call.
But the "cudaDeviceSynchronize" call is performed from different origins: cv::gpu::resize (resize.cu - line 265), cv::gpu::abs (transform_detail.hpp - line 364), cv::gpu::threshold (transform_detail.hpp - line 364).
I'm using a single GPU device and the default GPU device context (NVIDIA GeForce GTX 960). All application threads call OpenCV GPU functions and use the same GPU device context.
What I'm doing wrong? Or is there are common problem with multithreaded applications using OpenCV GPU support?