Revision history [back]

I'd guess you're allocating a GpuMat at the start of the loop and then reusing it in subsequent iterations of your loop. I've found that a critical optimization to CUDA operations is to preallocate all GpuMats and never allocate them on the stack. Similarly, don't resize a GpuMat once allocated. CUDA itself seems blindingly fast, but GPU memory allocations not so much.