In my multithreaded C++ program, I get malfunctions unless I keep CascadeClassifier_GPU::detectMultiScale inside a critical section. This is true even though each of my calling threads has a separate instance of CascadeClassifier_GPU.
Why is this? Is it a bug?
In the code below, all I have to do to make it break is to remove the scoped lock. This seems to prove the thread-unsafety.
void OcvGpuFaceFinder::detectMultiScale( const cv::Mat & img, std::vector< cv::Rect> & faceRects
, double scaleFactor
, int minNeighbors, int flagsIgnored
, cv::Size minFaceSize
, cv::Size maxFaceSizeIgnored
)
{
static MJCCritSect critter;
cv::gpu::GpuMat d_img;
d_img.upload( img );
int numFound= 0;
cv::Mat rectMat;
cv::gpu::GpuMat d_objBuf;
{
MJCCritSect::ScopedLocker locker( critter );
numFound = m_impl.detectMultiScale( d_img, d_objBuf
, scaleFactor
, minNeighbors
, minFaceSize
);
}
// download the part of the gpu dest Mat that contains found face rectangles
d_objBuf.colRange(0, numFound).download( rectMat );
cv::Rect* faces = rectMat.ptr<cv::Rect>();
faceRects.clear();
// copy face rects to final destination
for( int ii=0; ii < numFound; ++ii )
{
faceRects.push_back( faces[ ii ] );
}
}