Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Does parallel_for_ block until all threads complete?

Having problem with CUDA calls in parallel_for_ using GpuMats and streams. The following code does not work. The data in b_hist_[i] when downloaded has data from another thread in it. Having waitForCompletion() outside the parallel_for_ or after each calcHist(0 call does not matter. This code works fine without the parallel_for_ (running in serial)...

cv::parallel_for_(cv::Range(0, numImages), [&](const cv::Range& range)
{
    for (auto i = range.start; i < range.end; i++)
    //for (int i = 0; i < numImages; i++)
    {
        cv::cuda::calcHist(bgr_planes_[i][0], b_hist_[i], streams[i]);
        cv::cuda::calcHist(bgr_planes_[i][1], g_hist_[i], streams[i]);
        cv::cuda::calcHist(bgr_planes_[i][2], r_hist_[i], streams[i]);
        streams[i].waitForCompletion();

    }
});

Are the separate calcHist() calls interfering with each other? How to ensure the GPU processes them separately?

click to hide/show revision 2
retagged

updated 2020-10-06 04:00:55 -0600

berak gravatar image

Does parallel_for_ block until all threads complete?

Having problem with CUDA calls in parallel_for_ using GpuMats and streams. The following code does not work. The data in b_hist_[i] when downloaded has data from another thread in it. Having waitForCompletion() outside the parallel_for_ or after each calcHist(0 call does not matter. This code works fine without the parallel_for_ (running in serial)...

cv::parallel_for_(cv::Range(0, numImages), [&](const cv::Range& range)
{
    for (auto i = range.start; i < range.end; i++)
    //for (int i = 0; i < numImages; i++)
    {
        cv::cuda::calcHist(bgr_planes_[i][0], b_hist_[i], streams[i]);
        cv::cuda::calcHist(bgr_planes_[i][1], g_hist_[i], streams[i]);
        cv::cuda::calcHist(bgr_planes_[i][2], r_hist_[i], streams[i]);
        streams[i].waitForCompletion();

    }
});

Are the separate calcHist() calls interfering with each other? How to ensure the GPU processes them separately?