I made an attempt to solve this by using parallel_for_ that seems to work. At the top of cascadeclassifier.cpp I added:
class Parallel_predict: public cv::ParallelLoopBody
{
private:
const vector< Ptr<cvcascadeboost> >& v;
int idx;
int * result;
public:
Parallel_predict(const vector< Ptr<cvcascadeboost> >& vectorToProcess, int i, int * r)
: v(vectorToProcess), idx(i), result(r){ *result=1; }
virtual void operator()( const cv::Range &r ) const {
for (int i = r.start ; (i!=r.end)&&(*result==1) ; ++i)
{
if (v[i]->predict(idx)==.0f)
{
*result=0;
return;
}
}
}
};
Then, also in cascadeclassifier.cpp, I updated the int CvCascadeClassifier::predict( int sampleIdx )
method to look like this:
int CvCascadeClassifier::predict( int sampleIdx )
{
CV_DbgAssert( sampleIdx < numPos + numNeg );
int result;
Parallel_predict p(stageClassifiers, sampleIdx, &result);
cv::parallel_for_(cv::Range(0,(int)stageClassifiers.size()), p);
return result;
/* OLD CODE
for (vector< Ptr<cvcascadeboost> >::iterator it = stageClassifiers.begin();
it != stageClassifiers.end(); it++ )
{
if ( (*it)->predict( sampleIdx ) == 0.f )
return 0;
}
return 1;*/
}
I have only tried this on OSX. It seems to work alright and it for sure use all of my cores :) There are however more code that could be paralellized (like in fillPassedSamples
) but those are not very obvious how to attack.
/MB
Hello! I've worked on an implementation of this but it currently only supports GCD (macos). I have to look into how TBB is used and some other issues. Currently, if I disable TBB w/OpenCV, it doesn't use GCD. So I'm trying to find a solution that works with either before posting a patch. There's some other bugs to be fixed too :) Please see: http://code.opencv.org/issues/3147
Hi, if you want to share the code I might be able to port it to TBB (or better - to cv::parallel_for_)
Hello, Tal Darom! We face the same problem of too long searching over the negative samples set and are ready to start working on parallelization of opencv_traincascade with TBB. Hovewer, you might have already done this work. So, if you have or are close to finish it, are you ready to share your code? If not, I will start doing it by myself (it seems that a paralellized implementation still haven't appeared anywhere), but I had to ask this to probably avoid unnecessary work