Hi,
I'd like to know if the following CUDA pseudocode is feasible ?
1 dispatcher CPU thread, that will :
-- initialize CUDA streams, saying 12 differents streams for example, each stream may run the same GPU code
-- manage I/O frames on this CPU thread with VideoCapture/VideoWriter,
-- for each frame :
--- feed 1 free stream of the 12 CUDA streams in async to get the best usage of the total bandwith transfer (is async DMA transfer possible on every graphic cards or not ? >= compute capability x.x ?), with optimized struct for each data transfer --- release CPU until a Callback is done from the GPU : is it possible ?
--- receive async resulting data from any of the 12 GPU streams : so wake up the CPU thread, that will handle the Videowriter, and send a new frame to that free GPU stream... etc ?
What low cost Nvidia card would you advise for the best results ?
I understood GPU class is "compute capability 1.3" actually, but would it be 2.0 or higher in a near future ?
Tx for your answers ;)