Parallelization of linear part of opencv_traincascade

asked 2013-07-10 06:26:39 -0600

opencv_traincascade is parallelized using TBB but not completly. There is a pretty large linear part, notebly in updateTrainingSet(). When training the 20th stage and looking for 10,000 negative samples (for example) you have to search through 19^2*10,000 samples which takes a lot of time.

Does any one know if there is any intension to parallelize this part? (If not - I might do that myself).

Comments

Hello! I've worked on an implementation of this but it currently only supports GCD (macos). I have to look into how TBB is used and some other issues. Currently, if I disable TBB w/OpenCV, it doesn't use GCD. So I'm trying to find a solution that works with either before posting a patch. There's some other bugs to be fixed too :) Please see: http://code.opencv.org/issues/3147

nessence ( 2013-07-10 08:00:06 -0600 )edit

Hi, if you want to share the code I might be able to port it to TBB (or better - to cv::parallel_for_)

Tal Darom ( 2013-07-11 09:14:46 -0600 )edit

Hello, Tal Darom! We face the same problem of too long searching over the negative samples set and are ready to start working on parallelization of opencv_traincascade with TBB. Hovewer, you might have already done this work. So, if you have or are close to finish it, are you ready to share your code? If not, I will start doing it by myself (it seems that a paralellized implementation still haven't appeared anywhere), but I had to ask this to probably avoid unnecessary work

vskhitkov ( 2013-08-09 04:21:42 -0600 )edit

add a comment

answered 2013-10-15 14:55:36 -0600

matspetter
31 ●1 ●2

I made an attempt to solve this by using parallel_for_ that seems to work. At the top of cascadeclassifier.cpp I added:


class Parallel_predict: public cv::ParallelLoopBody
{
private:
    const vector< Ptr<cvcascadeboost> >& v;
    int idx;
    int * result;

public:
    Parallel_predict(const vector< Ptr<cvcascadeboost> >& vectorToProcess, int i, int * r)
    : v(vectorToProcess), idx(i), result(r){ *result=1; }

    virtual void operator()( const cv::Range &r ) const {
        for (int i = r.start ; (i!=r.end)&&(*result==1) ; ++i)
        {
            if (v[i]->predict(idx)==.0f)
            {
                *result=0;
                return;
            }
        }
    }
};

Then, also in cascadeclassifier.cpp, I updated the int CvCascadeClassifier::predict( int sampleIdx ) method to look like this:


int CvCascadeClassifier::predict( int sampleIdx )
{
    CV_DbgAssert( sampleIdx < numPos + numNeg );

    int result;
    Parallel_predict p(stageClassifiers, sampleIdx, &result);
    cv::parallel_for_(cv::Range(0,(int)stageClassifiers.size()), p);
    return result;


    /* OLD CODE
    for (vector< Ptr<cvcascadeboost> >::iterator it = stageClassifiers.begin();
        it != stageClassifiers.end(); it++ )
    {
        if ( (*it)->predict( sampleIdx ) == 0.f )
            return 0;
    }
    return 1;*/
}

I have only tried this on OSX. It seems to work alright and it for sure use all of my cores :) There are however more code that could be paralellized (like in fillPassedSamples) but those are not very obvious how to attack.

/MB

edit flag offensive delete link

Comments

hmm this solution do parallelize a part of the code but unfortunately it doesnt seem to help much since the "nature" of that part is such that it doesn't help much, if anything at all :( So, back to the drawingboard...

matspetter ( 2013-10-16 03:33:21 -0600 )edit

add a comment

Parallelization of linear part of opencv_traincascade

Comments

1 answer

Comments

Links

Question Tools

Stats

Related questions

Parallelization of linear part of opencv_traincascade edit

Comments

1 answer

Comments

Links

Question Tools

Stats

Related questions

Parallelization of linear part of opencv_traincascade