VS2010 NSight 3.1 Debugging non-OpenCV CUDA kernels
Hi,
I've got several custom sections of CUDA kernel code that I need to debug. The application also contains OpenCV GPU code (but I'm not trying to debug any of that). If I comment all the OpenCV function calls, I can set a breakpoint in the NSight debugger and my breakpoint is hit just fine. If I put the OpenCV calls back in, my breakpoint is never hit. I believe the reason (although I'm not sure) is that all the OpenCV GPU kernel modules (of which there are about 40ish) are being loaded into the debugger and possibly causing a timeout or something in NSight... However, I'm not calling the majority of those kernels in my code, but I think they get loaded as part of starting up the OpenCV GPU subsystem. I've tried running against the release build of the OpenCV libraries as well with same issue.
So, my questions are:
(1) Has anyone else experienced this? (2) Can I either speedup or remove all the unused OpenCV CUDA kernel loads? (3) Are there configuration settings in NSight that anyone is aware of to make this work even with all the kernels being loaded?
I'm not that concerned with debugging speed or performance, but I'd like my breakpoint to be able to be hit while debugging my non-OpenCV code.
Thanks for any help! My System: Windows 7 64bit, Visual Studio 2010SP1, NSight 3.1, CUDA 5.5.12, OpenCV 2.4.6.0 built with CUDA support
Edit: Additionally, I've found that if I remove the OpenCV CUDA built binaries (the ones I built with CUDA support) from my path and use the pre-built non-GPU binaries instead, then my breakpoint is hit. This makes me think even more that the GPU binaries are loading/doing something that is preventing my custom CUDA code breakpoint from being hit.
Edit: Further testing and an OpenCV rebuild with CUDA support but without DEBUG information yielded the same results. If I exchange the opencv_gpu246*.dll with the prebuilt binary, I don't see this issue and I'm able to debug my custom kernels, but obviously I can't use any opencv gpu functions in my application as well... which are used in the pipeline prior to my custom kernel.
Edit: I'm not the only one experiencing these issues, and it seems to be the same one as this unanswered question as well.