Revision history [back]

Optimized async GPU streams usage in Video post-treatment

Hi,

I'd like to know if the following CUDA pseudocode is feasible ?

1 dispatcher CPU thread, that will :

-- initialize CUDA streams, saying 12 differents streams for example, each stream may run the same GPU code

-- manage I/O frames on this CPU thread with VideoCapture/VideoWriter,

-- for each frame :
```
--- feed 1 free stream of the 12 CUDA streams in async to get the best usage of the total bandwith transfer (is async DMA transfer possible on every graphic cards or not ? >= compute capability x.x ?), with optimized struct for each data transfer
--- release CPU until a Callback is done from the GPU : is it possible ?
```
--- receive async resulting data from any of the 12 GPU streams : so wake up the CPU thread, that will handle the Videowriter, and send a new frame to that free GPU stream... etc ?

What low cost Nvidia card would you advise for the best results ?

I understood GPU class is "compute capability 1.3" actually, but would it be 2.0 or higher in a near future ?

Tx for your answers ;)

Copyright OpenCV foundation, 2012-2018. Content on this site is licensed under a Creative Commons Attribution Share Alike 3.0 license.

Powered by Askbot version 0.10.2

here is how