Fastest way to stream frames from a camera through GPU assisted function??
I am trying to use the gpu on the nvidia tegra processor to speed up some vision calculations.
I've used opencv for non-gpu accelerated processing and am generally familiar with it in that context.
My application involves reading frames from a USB3 camera which I'd like to process as fast as possible through some opencv operations which are available in GPU accelerated forms.
I've read the documentation for opencv "gpu" module, and based on what I've read, I've laid out the following general approach:
read frame from camera
choose next unused gpu i, set it as the active device with setDevice call,
using a Stream object corresponding to the gpu i ,
enqueue an upload, some image operations and a download of the result back to a Mat, then a callback with the frame sequence number
And in the callback, mark the gpu id as available for work, and initiate further processing of the downloaded result.
But it occurs to me that I may not need to do the "setDevice" call to select a particular gpu core -- the stream object may handle that for me.
Maybe I'm missing it, but it seems like the documentation doesn't address this issue.
I'm thinking I probably ought to use the CudaMem objects in place of Mat objects because I am guessing that uploads and downloads to these are faster - is this correct?
Any help would be greatly appreciated.