Ask Your Question
0

OpenCV parallel_for does not use multiple processors

asked 2012-12-06 08:16:53 -0600

blorgggg gravatar image

updated 2012-12-06 09:27:25 -0600

Vladislav Vinogradov gravatar image

I just saw in the new opencv 2.4.3 that they added a universal parallel_for. So following this example: http://answers.opencv.org/question/3730/how-to-use-parallel_for/

I tried to implement it myself. I got it all functioning with my code, but when I timed its processing vs a similar loop done in a typical serial fashion with a regular "for" command, the results were insignificantly faster, or often a tiny bit slower!

I thought maybe this had something to do with my pushing into vectors or something (im a pretty big noob to parallel processing), so i set up a test loop of just running through a big number and it still doesn't work

check it out:

code:

class Parallel_Test : public cv::ParallelLoopBody
{
private:
double* const mypointer;



public:
Parallel_Test(double* pointer)
: mypointer(pointer){

}
     void operator() (const Range& range) const
{
         //This constructor needs to be here otherwise it is considered an abstract class.
//             qDebug()<<"This should never be called";
}

    void operator ()(const cv::BlockedRange& range) const
    {

        for (int x = range.begin(); x < range.end(); ++x){

            mypointer[x]=x;

        }


    }



};


 //TODO Loop pixels in parallel
     double t = (double)getTickCount();

    //TEST PARALELL LOOPING AT ALL
    double data1[1000000];



        cv::parallel_for(BlockedRange(0, 1000000),  Parallel_Test(data1));

        t = ((double)getTickCount() - t)/getTickFrequency();
        qDebug() << "Parallel TEST time " << t << endl;


        t = (double)getTickCount();

        for(int i =0; i<1000000; i++){

            data1[i]=i;
        }
        t = ((double)getTickCount() - t)/getTickFrequency();
        qDebug() << "SERIAL Scan time " << t << endl;

Here's the output: output:

Parallel TEST time 0.00415479

SERIAL Scan time 0.00204597

that example was just a test case, my actual loop that i hope to parallelize takes about 1.5 seconds normally (i'm doing ICP registration over millions of 3D points) and the parallel_for does not improve that at all. What's even more telling is that only one processor is ever used at a time. Even if calling the threads was inefficient, it should at least be doing this with multiple cores. This leads me to believe that something is wrong.

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
2

answered 2012-12-06 09:29:41 -0600

Vladislav Vinogradov gravatar image

Use cv::parallel_for_, not cv::parallel_for (it is old implementation):

class Parallel_Test : public cv::ParallelLoopBody
{
    void operator() (const cv::Range& range) const
    {
         // your code
    }
};

cv::parallel_for_(cv::Range(0, 1000000), Parallel_Test());
edit flag offensive delete link more

Question Tools

Stats

Asked: 2012-12-06 08:16:53 -0600

Seen: 3,653 times

Last updated: Dec 06 '12