Ask Your Question

How the 3 basic Haar features are formed from the Haar wavelet ?

asked 2016-02-08 12:27:00 -0500

Sanman gravatar image

I have been digging a lot and to this point I know exactly how Viola-Jones algorithm works and how the haar cascade file is prepared. I am more interested in knowing the fact that how the 3 basic haar features are formed, and how they are derived using the haar wavelet, and why only that specific three ?

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2016-02-09 07:45:22 -0500

updated 2016-02-09 07:46:34 -0500

Did you even read the Viola and Jones paper describing the topic of face detection using a cascade of boosted features (haar wavelets in their case)? It clearly describes that those basic functions are used because they correspond to the most descriptive facial features, nothing less, nothing more.

That being said, if you want to know exactly how OpenCV builds it haar wavelet structure, take a look at chapter 5 of the OpenCV 3 Blueprints book, which goes into great depth explaining these features.

The images below describes what is being done (mind the small typo at the bottom, which is discussed here)

image description

This results in the following features for the HAAR model inside OpenCV (top 2 rows, ignore the LBP features here but no time to cut them off now)

image description

IF you need better and more detailed explanations, buy the book ^_^

edit flag offensive delete link more


I have read the Viola Jones paper, but they have not mentioned how this 4 basic haar features are formed ? I want to know how they have selected only this features, was that just their assumption or is there any logic behind forming only those typical 4 features ? I know how the whole process of face detection works. Right from integral image representation, Adaboost to Cascade Classifier, but why only this 4 features ?

Sanman gravatar imageSanman ( 2016-02-17 01:35:33 -0500 )edit

But how do you tell that there are only 4 typical features. That is simply not true ... You have 2 rectangle, 3 rectangle and 4 rectangle features each specifying stuff that can happen inside an image when it comes to gradient or texture information.

StevenPuttemans gravatar imageStevenPuttemans ( 2016-02-18 04:44:38 -0500 )edit

That's where I am confused, cause there are typically 3 types of features. 2,3 and 4 rectangle feature. This features vary in size (length and width) as they try to detect the face. So why only this 3 type of features, they could have also derived a 5 rectangle feature or a 6 rectangle feature. I know that would have increased the complexity in some way, but why only this specific 3 type of features and how they have arrived to this conclusion of deriving only this 3 type features ?

Sanman gravatar imageSanman ( 2016-02-19 00:23:27 -0500 )edit

Because they correspond to features that are retrievable in faces. 2 rectangle features are line gradient edges, 3 rectangle features correspond to valleys and ridges and 4 rectangular features corresponds to corners ...

StevenPuttemans gravatar imageStevenPuttemans ( 2016-02-19 05:52:26 -0500 )edit

People who are wondering about the same question and are not satisfied with this answer, then refer to this paper by P. Sinha This will clear out some things P. Sinha

Sanman gravatar imageSanman ( 2016-03-02 00:06:45 -0500 )edit

Wih this method can I detect/temove shadows too?

Noobie gravatar imageNoobie ( 2017-03-28 01:05:21 -0500 )edit

@Noobie, how would you use this for shadow detection/removal? Read some papers on the topic first, because approaches are quite different!

StevenPuttemans gravatar imageStevenPuttemans ( 2017-03-28 06:05:45 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower


Asked: 2016-02-08 12:27:00 -0500

Seen: 169 times

Last updated: Feb 09 '16