Thread-creation with parallel_for

asked 2015-08-10 12:37:53 -0500

updated 2015-09-07 14:23:50 -0500


I'm trying to speed up an application and want to see if parallel_for can help me. As my first step, I wrote a fancy programm that finds the maximum value per row in an image. This looks like that:

class Parallel_markMax: public cv::ParallelLoopBody
    cv::Mat &img_rgb_;
    cv::Mat &img_;

Parallel_markMax(Mat& img, Mat&img_rgb):

virtual void operator()( const Range& range ) const {

    int h = img_.rows;
    cout << "hell " << range.start << "  "  << range.size() << endl;

    for (int x = range.start; x < range.end; ++x){

        uchar max_val = 0;
        int pos = 0;
        for (int y = 0; y<h; ++y){
            if (<uchar>(y,x) > max_val){
                max_val =<uchar>(y,x);
                pos = y;

        cv::circle(img_rgb_, cv::Point(x,pos),3,cv::Scalar(255,0,0),1);

And i call it like this:

parallel_for_(Range(0,w), Parallel_markMax(img, img_col));

I have 8 threads (result of getNumThreads), so I expected that there will be eight threads with each 1/8 of the range. But I get huge amounts of calls to the operator() with each only a size of 1 to 10. So instead of giving a thread a bigger task, some Threadmanager only assigns very small tasks to each thread which probably leads to much overhead. In my example, I only get a speedup of 3 with 8 cores which is rather bad for a perfectly parallelizable task.

Is there a parameter I miss?

answered 2015-08-10 13:16:06 -0500

updated 2015-08-10 13:42:25 -0500

Alright, I figured it out with the help of a colleague:

parallel_for_(Range(0,w), Parallel_markMax(img, img_col),12);

You can pass an aditional parameter that controls the size of the individual tasks. In this case, I create each two threads for six of my eight cores. Two cores are normally used for other stuff, so I don't have to wait for the two last threads to finish when the other six are much faster.

