When we train OpenCV's face detection CascadeClassifier, we use 24 x 24 images. Then feature rectangles and related left/right leave values are those in xml files extracted from 24 x 24 images.
But when in detection implementation APIs at cascadedetect and detection_based_tracker files, the features are extracted from the rectangles from the re-sized images (re-sized with a scale factor) at line 1014 of cascadedetect.cpp as
if( !featureEvaluator->setImage( image, data.origWinSize ) )
return false;
Then the each pixel (x,y) is run through for the LBP_classifier at line 955 of cascadedetect.cpp as
for( int y = y1; y < y2; y += yStep )
{for( int x = 0; x < processingRectSize.width; x += yStep ) { }
}
Features are calculated from the rectangles of the re-sized image and offset value for each pixel location is used in the feature calculation.
int c = featureEvaluator(node.featureIdx); //cascadedetect.hpp line 461
#define CALC_SUM_(p0, p1, p2, p3, offset) ((p0)[offset] - (p1)[offset] - (p2)[offset] + (p3)[offset]) //cascadedetect.hpp line 58
My queries are
(1)LBP feature xml is constructed using 24 x 24 images. Why feature rectangles are used from the re-sized images (not from 24 x 24 sub-window areas of the re-sized image)?
(2)What does the offset value do in the following line
#define CALC_SUM_(p0, p1, p2, p3, offset) ((p0)[offset] - (p1)[offset] - (p2)[offset] + (p3)[offset]).
I checked p0 is integer pointer and pointing to an integer value. What does it((p0)[offset]) mean?
Thanks