Thread-creation with parallel_for
Hey!
I'm trying to speed up an application and want to see if parallel_for can help me. As my first step, I wrote a fancy programm that finds the maximum value per row in an image. This looks like that:
class Parallel_markMax: public cv::ParallelLoopBody
{
private:
cv::Mat &img_rgb_;
cv::Mat &img_;
public:
Parallel_markMax(Mat& img, Mat&img_rgb):
img_rgb_(img_rgb),
img_(img)
{}
virtual void operator()( const Range& range ) const {
int h = img_.rows;
cout << "hell " << range.start << " " << range.size() << endl;
for (int x = range.start; x < range.end; ++x){
uchar max_val = 0;
int pos = 0;
for (int y = 0; y<h; ++y){
if (img_.at<uchar>(y,x) > max_val){
max_val = img_.at<uchar>(y,x);
pos = y;
}
}
cv::circle(img_rgb_, cv::Point(x,pos),3,cv::Scalar(255,0,0),1);
}
}
};
And i call it like this:
parallel_for_(Range(0,w), Parallel_markMax(img, img_col));
I have 8 threads (result of getNumThreads), so I expected that there will be eight threads with each 1/8 of the range. But I get huge amounts of calls to the operator() with each only a size of 1 to 10. So instead of giving a thread a bigger task, some Threadmanager only assigns very small tasks to each thread which probably leads to much overhead. In my example, I only get a speedup of 3 with 8 cores which is rather bad for a perfectly parallelizable task.
Is there a parameter I miss?