Ask Your Question

How to use parallel_for?

asked 2012-11-04 02:02:30 -0600

Michael Burdinov gravatar image

In release notes for version 2.4.3 I discovered that OpenCV has build in parallel_for, but I can't find any documentation for it. Anyone know about its documentation or examples of use?

edit retag flag offensive close merge delete



ATTENTION: If you are new to this (like me), make note that the new implementation is "parallel_for_" (with a trailing underscore!) not "parallel_for" !!! Otherwise it will just run your loop in serial! AHH!

@Daniil Osokin and especially @Vladislav Vinogradov for pointing this out!

blorgggg gravatar imageblorgggg ( 2012-12-07 07:06:05 -0600 )edit

Thanks for the point. I missed it myself.

Michael Burdinov gravatar imageMichael Burdinov ( 2012-12-07 10:03:16 -0600 )edit

hi sir how to use parallel_for to QR code,to time reducing in QR code concept

Rthi gravatar imageRthi ( 2017-11-30 01:11:02 -0600 )edit

@Rthi, there is no reason of reviving 5 year old topics that have nothing to do with QR code processing ....

StevenPuttemans gravatar imageStevenPuttemans ( 2017-12-01 08:01:23 -0600 )edit

3 answers

Sort by ยป oldest newest most voted

answered 2012-11-04 04:32:02 -0600

alexcv gravatar image

updated 2012-11-04 04:49:43 -0600


As shown by Vladislav, you only need to derivate the cv::ParallelLoopBody class to make your own.

To complete, and answer Q3 (previous Qx may be related to includes, you should give more details about encountered errors). Then, related to Q3 : if you need to process in parallel local memory buffers or other data, you need a constructor that will point to the buffers to process when operator() is called.

Here is a sample code i use that may help you : It is a simple loop that clips buffer values to max and min values. I consider here classical tables of any type using templates. You can change this using std::vectors, cv::Mat or any other, only keep in mind that you have to create private identifiers that points to the beginning of data buffer that you want to manage.

In constructor, you show which buffer to process and eventually what constants to take into account. Once done everything is prepared to run parallel. In the operator() method, create new local pointers to the target block range

Hope it helps.



template <class type>

class Parallel_clipBufferValues: public cv::ParallelLoopBody
  type *bufferToClip;
  type minValue, maxValue;

  Parallel_clipBufferValues(type* bufferToProcess, const type min, const type max)
    : bufferToClip(bufferToProcess), minValue(min), maxValue(max){}

  virtual void operator()( const cv::Range &r ) const {
    register type *inputOutputBufferPTR=bufferToClip+r.start;
    for (register int jf = r.start; jf != r.end; ++jf, ++inputOutputBufferPTR)
        if (*inputOutputBufferPTR>maxValue)
        else if (*inputOutputBufferPTR<minValue)

Finally, how to use it :

const int SIZE=10;
int myTab[SIZE];
int minVal=0, maxVal=255;
parallel_for_(cv::Range(0,SIZE-1), Parallel_clipBufferValues<int>(myTab, minVal, maxVal));
edit flag offensive delete link more



Now I understand it better. Thank you for detailed answer.

Michael Burdinov gravatar imageMichael Burdinov ( 2012-11-04 05:39:31 -0600 )edit

How could I use this on a mat, if I need to know 2d coordinate information?

Gianluigi gravatar imageGianluigi ( 2012-11-05 22:04:57 -0600 )edit

Gianluigi, you can call parallel_for loop for each row in Mat, like you would do with usual 'for' loop. And of course if buffer of Mat is continious in memory all this can be done in single call.

Michael Burdinov gravatar imageMichael Burdinov ( 2012-11-06 00:25:52 -0600 )edit

So I need to do like this? for (int j=0; j<image.rows; j++) { parallel_for_(cv::Range(0,image.cols-1), Parallel_clipBufferValues<uchar>(image.ptr<uchar>(j), minVal, maxVal)); }

Gianluigi gravatar imageGianluigi ( 2012-11-06 23:39:12 -0600 )edit

Ye, I guess this is the way.

Michael Burdinov gravatar imageMichael Burdinov ( 2012-11-07 00:45:56 -0600 )edit

Yes, this is the way, for more than 1D matrix cases, "manually" do 'for' loops on the first dimensions and finally, make the last dimension be processed in parallel. This is a way, efficient with multicores systems, but other solutions are also possible.

alexcv gravatar imagealexcv ( 2012-11-07 03:36:33 -0600 )edit

Is there anything extra one needs to do to make sure it uses more than one processor? I implmented all this, and it runs fine, just not any faster than a serial loop.

blorgggg gravatar imageblorgggg ( 2012-12-06 08:13:22 -0600 )edit

@blorgggg To get real speed up, you should have a "heavy" computations in each iteration of loop. Look at cvtColor. Now it parallelized with parallel_for_ and have pretty speedup.

Daniil Osokin gravatar imageDaniil Osokin ( 2012-12-07 00:24:22 -0600 )edit

Hi @alexcv and @Michael. Is it possible to specify the number of processors? Thanks.

Nesbit gravatar imageNesbit ( 2014-09-18 20:29:07 -0600 )edit

answered 2012-11-04 02:28:59 -0600

Vladislav Vinogradov gravatar image

updated 2012-12-07 00:10:58 -0600

class Body : public cv::ParallelLoopBody
    void operator ()(const cv::Range& range) const
        for (int i = range.start; i < range.end; ++i)
Body body;
cv::parallel_for_(cv::Range(0, count), body);
edit flag offensive delete link more


Thank you for your answer. Now I have more questions :). (I am still using 2.4.2, so please tell me if my questions are not relevant in 2.4.3).

Michael Burdinov gravatar imageMichael Burdinov ( 2012-11-04 03:27:36 -0600 )edit

Q1: My code won't compile. It expects BlockedRange instead of Range. What is this BlockedRange?

Michael Burdinov gravatar imageMichael Burdinov ( 2012-11-04 03:29:19 -0600 )edit

Q2: ParallelLoopBody is not found. What is this and why it is needed?

Michael Burdinov gravatar imageMichael Burdinov ( 2012-11-04 03:30:31 -0600 )edit

Q3: How can it access local variables if this parallel_for loop requiers use of some outside class?

Michael Burdinov gravatar imageMichael Burdinov ( 2012-11-04 03:33:01 -0600 )edit

answered 2012-12-06 08:11:52 -0600

blorgggg gravatar image

I tried this out myself and got everything to compile and function, but the program would never use more than one processor. Is there something special you also need to do other than have opencv 2.4.3 compiled with the Using TBB selected?

Here's a sample code (my actual looping body was much more substantial, this was just a test to make sure that yes, no matter what it wasn't using more than one core)

class Parallel_Test : public cv::ParallelLoopBody
double* const mypointer;

Parallel_Test(double* pointer)
: mypointer(pointer){

     void operator() (const Range& range) const
         //This constructor needs to be here otherwise it is considered an abstract class.
//             qDebug()<<"This should never be called";

    void operator ()(const cv::BlockedRange& range) const

        for (int x = range.begin(); x < range.end(); ++x){





 //TODO Loop pixels in parallel
     double t = (double)getTickCount();

    double data1[1000000];

        cv::parallel_for(BlockedRange(0, 1000000),  Parallel_Test(data1));

        t = ((double)getTickCount() - t)/getTickFrequency();
        qDebug() << "Parallel TEST time " << t << endl;

        t = (double)getTickCount();

        for(int i =0; i<1000000; i++){

        t = ((double)getTickCount() - t)/getTickFrequency();
        qDebug() << "SERIAL Scan time " << t << endl;
edit flag offensive delete link more


This should be posted as separate question. Short answer is in comments to original question.

Daniil Osokin gravatar imageDaniil Osokin ( 2012-12-07 01:22:50 -0600 )edit

Question Tools



Asked: 2012-11-04 02:02:30 -0600

Seen: 36,369 times

Last updated: Nov 30 '17