Does parallel_for_ block until all threads complete?
Having problem with CUDA calls in parallel_for_ using GpuMats and streams. The following code does not work. The data in b_hist_[i] when downloaded has data from another thread in it. Having waitForCompletion() outside the parallel_for_ or after each calcHist(0 call does not matter. This code works fine without the parallel_for_ (running in serial)...
cv::parallel_for_(cv::Range(0, numImages), [&](const cv::Range& range)
{
for (auto i = range.start; i < range.end; i++)
//for (int i = 0; i < numImages; i++)
{
cv::cuda::calcHist(bgr_planes_[i][0], b_hist_[i], streams[i]);
cv::cuda::calcHist(bgr_planes_[i][1], g_hist_[i], streams[i]);
cv::cuda::calcHist(bgr_planes_[i][2], r_hist_[i], streams[i]);
streams[i].waitForCompletion();
}
});
Are the separate calcHist() calls interfering with each other? How to ensure the GPU processes them separately?