Ask Your Question

Revision history [back]

Implement Viola-Jones from scratch with pre-trained Haar cascade

I’m trying to implement a simple version of the Viola-Jones object detection algorithm. For my particular problem I cannot rely on native C++ code, so I cannot simply reuse the implementations that are already part of OpenCV. I am, however, using a pre-generated Haar cascade XML that is provided as part of OpenCV.

I read the research papers and online documentation relevant to Viola-Jones, but I found that the main focus is always about training the Haar cascade, and very little detail is spent on the actual detection algorithm that uses the trained cascade.

My code is able to read the existing cascade XML file, and I constructed a basic detection algorithm based on https://stackoverflow.com/questions/34895186 — I created a few positive and negative 24x24 images that should be easily recognizable by the pre-trained cascade, but all images get rejected by the 4th or 5th stage. The cascade file that I use only has a single node level and no tilting.

At a high-level, my algorithm does the following:

•   Convert the RGB input image into a gray-level matrix with values between 0.0 and 1.0
•   Convert this matrix into an integral image for easy summation of brightness values in rectangular areas
•   For each stage of the cascade:
    ⁃   Initially set the stage value to 0.0
    ⁃   For each classifier in the stage
        ⁃   Initially set the feature value to 0.0
        ⁃   For each weighted rectangle in the feature's <rects> element
            ⁃   get the area’s summed-up value from the integral image
            ⁃   divide the result by the area size
            ⁃   divide the result by the weight associated with rectangle in the cascade
            ⁃   add the result to the feature value
        ⁃   if the feature value is below the classifier’s threshold
            ⁃   add the value of the left leaf node to the stage value
        ⁃   else
            ⁃   add the value of the right leaf node to the stage value
    ⁃   terminate if the stage value is above the stage's <stageThreshold>

Is my math and my high-level structure correct here? Am I missing some important step? Did I misinterpret any of the operations? Any feedback or debugging suggestions are greatly appreciated!