Ask Your Question

Parallelization of linear part of opencv_traincascade

asked 2013-07-10 06:26:39 -0600

opencv_traincascade is parallelized using TBB but not completly. There is a pretty large linear part, notebly in updateTrainingSet(). When training the 20th stage and looking for 10,000 negative samples (for example) you have to search through 19^2*10,000 samples which takes a lot of time.

Does any one know if there is any intension to parallelize this part? (If not - I might do that myself).

edit retag flag offensive close merge delete



Hello! I've worked on an implementation of this but it currently only supports GCD (macos). I have to look into how TBB is used and some other issues. Currently, if I disable TBB w/OpenCV, it doesn't use GCD. So I'm trying to find a solution that works with either before posting a patch. There's some other bugs to be fixed too :) Please see:

nessence gravatar imagenessence ( 2013-07-10 08:00:06 -0600 )edit

Hi, if you want to share the code I might be able to port it to TBB (or better - to cv::parallel_for_)

Tal Darom gravatar imageTal Darom ( 2013-07-11 09:14:46 -0600 )edit

Hello, Tal Darom! We face the same problem of too long searching over the negative samples set and are ready to start working on parallelization of opencv_traincascade with TBB. Hovewer, you might have already done this work. So, if you have or are close to finish it, are you ready to share your code? If not, I will start doing it by myself (it seems that a paralellized implementation still haven't appeared anywhere), but I had to ask this to probably avoid unnecessary work

vskhitkov gravatar imagevskhitkov ( 2013-08-09 04:21:42 -0600 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2013-10-15 14:55:36 -0600

matspetter gravatar image

I made an attempt to solve this by using parallel_for_ that seems to work. At the top of cascadeclassifier.cpp I added:

class Parallel_predict: public cv::ParallelLoopBody
    const vector< Ptr<cvcascadeboost> >& v;
    int idx;
    int * result;

    Parallel_predict(const vector< Ptr<cvcascadeboost> >& vectorToProcess, int i, int * r)
    : v(vectorToProcess), idx(i), result(r){ *result=1; }

    virtual void operator()( const cv::Range &r ) const {
        for (int i = r.start ; (i!=r.end)&&(*result==1) ; ++i)
            if (v[i]->predict(idx)==.0f)

Then, also in cascadeclassifier.cpp, I updated the int CvCascadeClassifier::predict( int sampleIdx ) method to look like this:

int CvCascadeClassifier::predict( int sampleIdx )
    CV_DbgAssert( sampleIdx < numPos + numNeg );

    int result;
    Parallel_predict p(stageClassifiers, sampleIdx, &result);
    cv::parallel_for_(cv::Range(0,(int)stageClassifiers.size()), p);
    return result;

    /* OLD CODE
    for (vector< Ptr<cvcascadeboost> >::iterator it = stageClassifiers.begin();
        it != stageClassifiers.end(); it++ )
        if ( (*it)->predict( sampleIdx ) == 0.f )
            return 0;
    return 1;*/

I have only tried this on OSX. It seems to work alright and it for sure use all of my cores :) There are however more code that could be paralellized (like in fillPassedSamples) but those are not very obvious how to attack.


edit flag offensive delete link more


hmm this solution do parallelize a part of the code but unfortunately it doesnt seem to help much since the "nature" of that part is such that it doesn't help much, if anything at all :( So, back to the drawingboard...

matspetter gravatar imagematspetter ( 2013-10-16 03:33:21 -0600 )edit

Question Tools


Asked: 2013-07-10 06:26:39 -0600

Seen: 515 times

Last updated: Oct 15 '13