Ask Your Question

NVIDIA CUDA Toolkit 5.0 Visual Profiler “Enable concurrent kernels profiling” application requirements

asked 2012-10-02 16:05:51 -0600

giu gravatar image

updated 2012-10-05 09:10:17 -0600

I'm using CUDA Toolkit 5.0 because I need to use the new feature of the NVIDIA Visual Profiler of this Toolkit that allows to view in the timeline concurrent kernels executed asynchronously (this is not possible with the CUDA Toolkit 4.2). For this reason, I built (succesfully) the source code of OpenCV 2.4.2 with this Toolkit (5.0) installed on my pc, and I'm able to compile and execute correctly my application with concurrent kernels: some of them are invoked by functions of the module OpenCV_GPU and others are kernels I directly wrote in CUDA. Unfortunately, CUDA 5.0 NVIDIA Visual Profiler can't trace the timeline of my application if I enable the feature: "Enable concurrent kernels profiling". It creates the timeline correctly both for code written using ONLY OpenCV functions and for code written using ONLY CUDA functions. Indeed, it stops working when I mix the two in the same application . I think this may be caused by the fact that OpenCV calls should use the same CUDA Context as the rest of the CUDA code. How can I manage the CUDA Context in order to allow the Profiler to trace the timeline?

OS: Windows 7 64 bit; Compiler: Visual Studio 2010 Professional; Driver: 306.23; Device: GeForce GTX 680 or GeForce GT 650M

Thank you for your attention!

edit retag flag offensive close merge delete

1 answer

Sort by » oldest newest most voted

answered 2012-10-06 11:22:40 -0600

giu gravatar image

Well, trying to solve my problem, I experimented that it was not a problem of CUDA Context: applications written using both CUDA and OpenCV are traced well by the Profiler. Instead, it was a problem of memory: simply, in the application that contains both the CUDA version and the OpenCV version of my algorithm, I use a number of streams that is twice the size of that I use in the applications with only one version of the algorithm, and this exceeds the memory capacity of the Profiler. I thought that it was a problem of the Profiler besause the application with the two methods runs correctly, and it only stops when I run it from the Profiler in the "Enable concurrent kernels execution" modality to trace the timeline. This must be explained by the fact that the Profiler uses much more memory to trace the timeline in this modality, so the limit of the number of streams is lower than in the synchronous modalitiy. However, I am a beginner, so I'd better not advance hypotheses riskly. I solved it out using fewer streams. I apologize for the misleading question.

edit flag offensive delete link more

Question Tools


Asked: 2012-10-02 16:05:51 -0600

Seen: 1,305 times

Last updated: Oct 06 '12