Ask Your Question

Revision history [back]

Thread-creatin with parallel_for

Hey!

I'm trying to speed up an application and want to see if parallel_for can help me. As my first step, I wrote a fancy programm that finds the maximum value per row in an image. This looks like that:

class Parallel_markMax: public cv::ParallelLoopBody
{
private:
    cv::Mat &img_rgb_;
    cv::Mat &img_;

public:
Parallel_markMax(Mat& img, Mat&img_rgb):
    img_rgb_(img_rgb),
    img_(img)
{}

virtual void operator()( const Range& range ) const {

    int h = img_.rows;
    cout << "hell " << range.start << "  "  << range.size() << endl;

    for (int x = range.start; x < range.end; ++x){

        uchar max_val = 0;
        int pos = 0;
        for (int y = 0; y<h; ++y){
            if (img_.at<uchar>(y,x) > max_val){
                max_val = img_.at<uchar>(y,x);
                pos = y;
            }
        }

        cv::circle(img_rgb_, cv::Point(x,pos),3,cv::Scalar(255,0,0),1);
    }
}
};

And i call it like this:

parallel_for_(Range(0,w), Parallel_markMax(img, img_col));

I have 8 threads (result of getNumThreads), so I expected that there will be eight threads with each 1/8 of the range. But I get huge amounts of calls to the operator() with each only a size of 1 to 10. So instead of giving a thread a bigger task, some Threadmanager only assigns very small tasks to each thread which probably leads to much overhead. In my example, I only get a speedup of 3 with 8 cores which is rather bad for a perfectly parallelizable task.

Is there a parameter I miss?

Thread-creatin Thread-creation with parallel_for

Hey!

I'm trying to speed up an application and want to see if parallel_for can help me. As my first step, I wrote a fancy programm that finds the maximum value per row in an image. This looks like that:

class Parallel_markMax: public cv::ParallelLoopBody
{
private:
    cv::Mat &img_rgb_;
    cv::Mat &img_;

public:
Parallel_markMax(Mat& img, Mat&img_rgb):
    img_rgb_(img_rgb),
    img_(img)
{}

virtual void operator()( const Range& range ) const {

    int h = img_.rows;
    cout << "hell " << range.start << "  "  << range.size() << endl;

    for (int x = range.start; x < range.end; ++x){

        uchar max_val = 0;
        int pos = 0;
        for (int y = 0; y<h; ++y){
            if (img_.at<uchar>(y,x) > max_val){
                max_val = img_.at<uchar>(y,x);
                pos = y;
            }
        }

        cv::circle(img_rgb_, cv::Point(x,pos),3,cv::Scalar(255,0,0),1);
    }
}
};

And i call it like this:

parallel_for_(Range(0,w), Parallel_markMax(img, img_col));

I have 8 threads (result of getNumThreads), so I expected that there will be eight threads with each 1/8 of the range. But I get huge amounts of calls to the operator() with each only a size of 1 to 10. So instead of giving a thread a bigger task, some Threadmanager only assigns very small tasks to each thread which probably leads to much overhead. In my example, I only get a speedup of 3 with 8 cores which is rather bad for a perfectly parallelizable task.

Is there a parameter I miss?

Thread-creation with parallel_for

Hey!

I'm trying to speed up an application and want to see if parallel_for can help me. As my first step, I wrote a fancy programm that finds the maximum value per row in an image. This looks like that:

class Parallel_markMax: public cv::ParallelLoopBody
{
private:
    cv::Mat &img_rgb_;
    cv::Mat &img_;

public:
Parallel_markMax(Mat& img, Mat&img_rgb):
    img_rgb_(img_rgb),
    img_(img)
{}

virtual void operator()( const Range& range ) const {

    int h = img_.rows;
    cout << "hell " << range.start << "  "  << range.size() << endl;

    for (int x = range.start; x < range.end; ++x){

        uchar max_val = 0;
        int pos = 0;
        for (int y = 0; y<h; ++y){
            if (img_.at<uchar>(y,x) > max_val){
                max_val = img_.at<uchar>(y,x);
                pos = y;
            }
        }

        cv::circle(img_rgb_, cv::Point(x,pos),3,cv::Scalar(255,0,0),1);
    }
}
};

And i call it like this:

parallel_for_(Range(0,w), Parallel_markMax(img, img_col));

I have 8 threads (result of getNumThreads), so I expected that there will be eight threads with each 1/8 of the range. But I get huge amounts of calls to the operator() with each only a size of 1 to 10. So instead of giving a thread a bigger task, some Threadmanager only assigns very small tasks to each thread which probably leads to much overhead. In my example, I only get a speedup of 3 with 8 cores which is rather bad for a perfectly parallelizable task.

Is there a parameter I miss?