No effect from using cuda::Stream?
I was just experimenting with using Streams with cuda functions to see what kind of performance impact they have, but it seems like very often the function takes just as long to launch with and without using the stream parameter. I was expecting the call with the stream to launch an asynchronous operation which would essentially take 0 time, but it seems to be the full time of the operation itself.
For example, with a GpuMat uploaded I tried
gpuImage.download(cpuImage);
and
cv::cuda::Stream myStream;
gpuImage.download(cpuImage, myStream);
and I've seen no time difference whatsoever.
Is there something I'm missing in how to use these?
Same problem for me. Feels like Stream has no effect/not implemented. @edit - I found the problem. In order for Stream to have effect, you need to make sure all the destination GPUMat have memory allocated to them...
Could you elaborate on allocating memory to the GpuMat ? I have experienced the same issue when timing an asynchronous call and it taking as long as a synchronous call