1 | initial version |
In OpenCV steams are effective for asynchronous data transfer if as suggested by @pwuertz you pin the memory first. This can be achieved with either cv::cuda::HostMem
or cv.cuda.registerPageLocked
.
They are also effective for overlapping host and device computation (if the OpenCV function doesn't have its own fixed host/device sync points for intermediate calculations) and not just for Multithreaded computation.
See Accelerating OpenCV with CUDA streams in Python for an overview of how they can be used to optimize a single threaded toy problem.
2 | No.2 Revision |
In OpenCV steams are effective for asynchronous data transfer if as suggested by @pwuertz you pin the memory first. This can be achieved with either cv::cuda::HostMem
or
.cv.cuda.registerPageLockedcv::cuda::registerPageLocked
They are also effective for overlapping host and device computation (if the OpenCV function doesn't have its own fixed host/device sync points for intermediate calculations) and not just for Multithreaded computation.
See Accelerating OpenCV with CUDA streams in Python for an overview of how they can be used to optimize a single threaded toy problem.