Ask Your Question
12

How to use parallel_for?

asked 2012-11-04 02:02:30 -0600

Michael Burdinov gravatar image Michael Burdinov
3765 3 21 70

In release notes for version 2.4.3 I discovered that OpenCV has build in parallel_for, but I can't find any documentation for it. Anyone know about its documentation or examples of use?

delete close flag offensive retag edit

Comments

3

ATTENTION: If you are new to this (like me), make note that the new implementation is "parallel_for_" (with a trailing underscore!) not "parallel_for" !!! Otherwise it will just run your loop in serial! AHH!

@Daniil Osokin and especially @Vladislav Vinogradov for pointing this out!

blorgggg ( 2012-12-07 07:06:05 -0600 )edit

Thanks for the point. I missed it myself.

Michael Burdinov ( 2012-12-07 10:03:16 -0600 )edit

3 Answers

Sort by ยป oldest newest most voted
13

answered 2012-11-04 04:32:02 -0600

alexcv gravatar image alexcv flag of France
426 2 5 12
http://sites.google.com/s...

updated 2012-11-04 04:49:43 -0600

Hi,

As shown by Vladislav, you only need to derivate the cv::ParallelLoopBody class to make your own.

To complete, and answer Q3 (previous Qx may be related to includes, you should give more details about encountered errors). Then, related to Q3 : if you need to process in parallel local memory buffers or other data, you need a constructor that will point to the buffers to process when operator() is called.

Here is a sample code i use that may help you : It is a simple loop that clips buffer values to max and min values. I consider here classical tables of any type using templates. You can change this using std::vectors, cv::Mat or any other, only keep in mind that you have to create private identifiers that points to the beginning of data buffer that you want to manage.

In constructor, you show which buffer to process and eventually what constants to take into account. Once done everything is prepared to run parallel. In the operator() method, create new local pointers to the target block range

Hope it helps.

Regards

Alex

template <class type>

class Parallel_clipBufferValues: public cv::ParallelLoopBody
{   
private:
  type *bufferToClip;
  type minValue, maxValue;

public:
  Parallel_clipBufferValues(type* bufferToProcess, const type min, const type max)
    : bufferToClip(bufferToProcess), minValue(min), maxValue(max){}

  virtual void operator()( const cv::Range &r ) const {
    register type *inputOutputBufferPTR=bufferToClip+r.start;
    for (register int jf = r.start; jf != r.end; ++jf, ++inputOutputBufferPTR)
    {
        if (*inputOutputBufferPTR>maxValue)
            *inputOutputBufferPTR=maxValue;
        else if (*inputOutputBufferPTR<minValue)
            *inputOutputBufferPTR=minValue;
    }
  }
};

Finally, how to use it :

const int SIZE=10;
int myTab[SIZE];
int minVal=0, maxVal=255;
parallel_for_(cv::Range(0,SIZE-1), Parallel_clipBufferValues<int>(myTab, minVal, maxVal));
link delete flag offensive edit

Comments

1

Now I understand it better. Thank you for detailed answer.

Michael Burdinov ( 2012-11-04 05:39:31 -0600 )edit

How could I use this on a mat, if I need to know 2d coordinate information?

Gianluigi ( 2012-11-05 22:04:57 -0600 )edit

Gianluigi, you can call parallel_for loop for each row in Mat, like you would do with usual 'for' loop. And of course if buffer of Mat is continious in memory all this can be done in single call.

Michael Burdinov ( 2012-11-06 00:25:52 -0600 )edit

So I need to do like this? for (int j=0; j<image.rows; j++) { parallel_for_(cv::Range(0,image.cols-1), Parallel_clipBufferValues<uchar>(image.ptr<uchar>(j), minVal, maxVal)); }

Gianluigi ( 2012-11-06 23:39:12 -0600 )edit

Ye, I guess this is the way.

Michael Burdinov ( 2012-11-07 00:45:56 -0600 )edit

Yes, this is the way, for more than 1D matrix cases, "manually" do 'for' loops on the first dimensions and finally, make the last dimension be processed in parallel. This is a way, efficient with multicores systems, but other solutions are also possible.

alexcv ( 2012-11-07 03:36:33 -0600 )edit

Is there anything extra one needs to do to make sure it uses more than one processor? I implmented all this, and it runs fine, just not any faster than a serial loop.

blorgggg ( 2012-12-06 08:13:22 -0600 )edit

@blorgggg To get real speed up, you should have a "heavy" computations in each iteration of loop. Look at cvtColor. Now it parallelized with parallel_for_ and have pretty speedup.

Daniil Osokin ( 2012-12-07 00:24:22 -0600 )edit

Hi @alexcv and @Michael. Is it possible to specify the number of processors? Thanks.

Nesbit ( 2014-09-18 20:29:07 -0600 )edit
1
4

answered 2012-11-04 02:28:59 -0600

Vladislav Vinogradov gravatar image Vladislav Vinogradov flag of Russian Federation
4251 13 53

updated 2012-12-07 00:10:58 -0600

class Body : public cv::ParallelLoopBody
{
public:
    void operator ()(const cv::Range& range) const
    {
        for (int i = range.start; i < range.end; ++i)
            ...
    }
};
Body body;
cv::parallel_for_(cv::Range(0, count), body);
link delete flag offensive edit

Comments

Thank you for your answer. Now I have more questions :). (I am still using 2.4.2, so please tell me if my questions are not relevant in 2.4.3).

Michael Burdinov ( 2012-11-04 03:27:36 -0600 )edit

Q1: My code won't compile. It expects BlockedRange instead of Range. What is this BlockedRange?

Michael Burdinov ( 2012-11-04 03:29:19 -0600 )edit

Q2: ParallelLoopBody is not found. What is this and why it is needed?

Michael Burdinov ( 2012-11-04 03:30:31 -0600 )edit

Q3: How can it access local variables if this parallel_for loop requiers use of some outside class?

Michael Burdinov ( 2012-11-04 03:33:01 -0600 )edit
0

answered 2012-12-06 08:11:52 -0600

blorgggg gravatar image blorgggg
11 1 1

I tried this out myself and got everything to compile and function, but the program would never use more than one processor. Is there something special you also need to do other than have opencv 2.4.3 compiled with the Using TBB selected?

Here's a sample code (my actual looping body was much more substantial, this was just a test to make sure that yes, no matter what it wasn't using more than one core)

class Parallel_Test : public cv::ParallelLoopBody
{
private:
double* const mypointer;



public:
Parallel_Test(double* pointer)
: mypointer(pointer){

}
     void operator() (const Range& range) const
{
         //This constructor needs to be here otherwise it is considered an abstract class.
//             qDebug()<<"This should never be called";
}

    void operator ()(const cv::BlockedRange& range) const
    {

        for (int x = range.begin(); x < range.end(); ++x){

            mypointer[x]=x;

        }


    }



};


 //TODO Loop pixels in parallel
     double t = (double)getTickCount();

    //TEST PARALELL LOOPING AT ALL
    double data1[1000000];



        cv::parallel_for(BlockedRange(0, 1000000),  Parallel_Test(data1));

        t = ((double)getTickCount() - t)/getTickFrequency();
        qDebug() << "Parallel TEST time " << t << endl;


        t = (double)getTickCount();

        for(int i =0; i<1000000; i++){

            data1[i]=i;
        }
        t = ((double)getTickCount() - t)/getTickFrequency();
        qDebug() << "SERIAL Scan time " << t << endl;
link delete flag offensive edit

Comments

This should be posted as separate question. Short answer is in comments to original question.

Daniil Osokin ( 2012-12-07 01:22:50 -0600 )edit

Login/Signup to Answer

Question tools

Follow
1 follower

subscribe to rss feed

Stats

Asked: 2012-11-04 02:02:30 -0600

Seen: 9,886 times

Last updated: Dec 07 '12