Revision history [back]

OpenCV's face detection algorithm's implementation in training and detection

I believe, somebody can explain me how OpenCV's face detection objects, DetectionBasedTracker and CascadeClassifier, make a trick in face detection.

When we train the cascade classifier, we use 24 x 24 size images. Then we have rectangles for LBP features in the xml files. So these are the rectangles with respect to 24 x 24 size trained images.

But when detection is implemented in CascadeClassifier class, features are calculated from re-sized images (not from 24 x 24 image), for those rectangles, as shown below (line 1014 at cascadedetect.cpp).

      //original image size is scaled by a scale factor
if( !featureEvaluator->setImage( image, data.origWinSize ) )
return false;

Then the whole image is processed for each x,y pixels and their respective 24 x 24 windows from that re-sized image.

    for( int y = y1; y < y2; y += yStep )   
    {                                                                                                                                
          for( int x = 0; x < processingRectSize.width; x += yStep )
          { }
    }

Then the offset value is used to relate the rectangle's feature and its x,y position in re-sized image.

#define CALC_SUM_(p0, p1, p2, p3, offset) \ ((p0)[offset] - (p1)[offset] - (p2)[offset] + (p3)[offset])

What I don't understand is that -even though feature rectangles in xml file are extracted from 24 x 24 size images in training, but in real detection features are calculated from feature rectangles at re-sized images and offset is used. What does this offset value do the trick? -My thinking is that if feature rectangles are extracted from 24 x 24 size images in training, it should use 24 x 24 size images in detection as well. How is the trick behind in which features are calculated from rectangles at re-sized images? Thanks