Why does opencv_traincascade not ignore nodes with a false alarm rate of 1

opencv_traincascade

asked 2013-12-14 15:04:51 -0600

_Robert
81 ●7

updated 2013-12-15 14:37:14 -0600

Im using opencv_traincascade with a one stage classifier. I don't really know how it works but it seems like it guesses at rectangles ('features' in CV terminology?) to try and divide the positive samples from the negative sample.

HR is hit rate - the proportion of positive samples that are (correctly) passed though.
FA is false alarm rate - the proportion of negative samples that are (incorrectly) passed though.

Is my understanding correct?

My output is looks like this:

===== TRAINING 0-stage =====
<BEGIN
POS count : consumed   27 : 27
NEG count : acceptanceRatio    416 : 1
Precalculation time: 3
+----+---------+---------+
|  N |    HR   |    FA   |
+----+---------+---------+
|   1|        1|        1|
+----+---------+---------+
|   2|        1|        1|
+----+---------+---------+
|   3|        1|0.0576923|
+----+---------+---------+
|   4|        1|0.00480769|
+----+---------+---------+
END>

Why does it not ignore feature/node/rectangle number 1 and number 2 since they appear to simply let though everything?

edit retag flag offensive close merge delete

Comments

I am also wondering about this. Any ideas?

_Martin ( 2015-01-29 08:56:25 -0600 )edit

forget my comment -_- i am mixing stuff up it seems!

StevenPuttemans ( 2015-01-31 05:01:43 -0600 )edit

add a comment

2 answers

Sort by » oldest newest most voted

answered 2015-01-31 05:13:30 -0600

StevenPuttemans

20029 ●16 ●82 ●207 http://stevenputtemans...

You should take a look at the sourcecode of the boosting process to completely understand it but you are misreading the output and what is happening. Let me explain.

Each stages follows these steps

Starts by grabbing #pos and #neg for the stage
Takes a first feature from the complete feature pool on accompagnied by the model dimensions, which allows to classify the set of positive samples 100% correctly
It calculates the FA that this single feature (if you selected weak classifier depth as 1 then this will yield a weak classifier stump) yields on the negative samples and check if this is already below the maxFalseAlarmRate setting.
Now we iteratively add an extra feature from the feature pool that ensures the positives are still correctly classified and do not drop under the minimum hit rate of for example 0.995 AND that ensures that we have a drop in the FA rate of the negative samples.
We continue to add features until the maxFalseAlarmRate is exceeded. This means that you have a classifier stage that is a bit better then a random guess (50%) and we then move up to the next stage.

When moving to a next stage

Discard all positive samples that are now wrongfully classified in the previous stage. Get new ones for that. This is the reason why you should never add the max amount of pos samples that you have to the numPos parameter.
Remove all negative samples that were correctly classified and grab new windows (which do not get classified as negatives by all previous stages) untill you have as much as numNeg.
Train a new stage of weak classifiers.

edit flag offensive delete link

Comments

Great post!

Guanta ( 2015-01-31 05:57:58 -0600 )edit

Just wrote it up for a book chapter I am writing so its good to see if people argee or not :D

StevenPuttemans ( 2015-01-31 06:08:20 -0600 )edit

Thanks for the explanation. I'm trying to dig deeper into the sourcecode and was looking into the boosting process, but somehow I got lost or at least I can't find where features are selected as weak classifiers. Any help?

One more question for my understanding: Wouldn't it be "better" to select a feature which already filters out some negatives? Or what is the gain in keeping those weak classifiers?

_Martin ( 2015-02-01 09:41:36 -0600 )edit

Good luck with the sourcecode of boosting, it is a part of machine learning that even I try to avoid. No documentation at all and a bunch of weird abbreviations. To know which features are selected you can either look at your final model or at the in between model results of the stage. Each stage has a set of weak classifiers each denoted by some values of whch the third value is the feature index of the feature set at the bottom of the model file. As to the selecting features, that is not how boosting works. You need first to guarantee a good rate on your positives, then try to make a model that can eliminate false positive detections by correctly adding negatives and carefully selecting features. Switch that and training will take ages.

StevenPuttemans ( 2015-02-01 12:52:41 -0600 )edit

I'm sorry to bother again. But when I take a look at the algorithm on wikipedia (http://en.wikipedia.org/wiki/AdaBoost...), the weak learner is selected by minimizing some error function (misclassification).

But (if I am right) this error should be (in general) higher for a weak learner with HR=1 and FA=1 than for one with HR=1/FA=0.98

What do I miss?

_Martin ( 2015-02-02 08:16:11 -0600 )edit

The minimizing of the error function is only started at the second step in the training afaik.

StevenPuttemans ( 2015-02-02 08:32:08 -0600 )edit

add a comment

answered 2015-02-09 07:00:48 -0600

_Martin
11 ●2 ●1

I dug a little deeper into the CV code of CvBoostTree and CvDTree and I got a slightly different explanation:

A weak classifier is indeed trained by minimizing some misclassification function (in case of Real Adaboost, its the Gini index, see: CV doc). So the wanted Hit-Rate does not come into play in this part of the algorithm.
It then takes the best feature (lowest misclassification) and uses this one as the next weak classifier in the current stage.
As the last step in this iteration, the algorithm calculates the overall stage threshold. And now the defined Hit Rate comes into play: the stage threshold is calculated so that the desired Hit Rate is guaranteed.
In the next iteration, the weights are modified according to their classification in the last iteration step, therefore yielding different weak classifiers (decision stumps).

If you take all of this in consideration, it is clear that a HR of 1 and FA of 1 are possible!

edit flag offensive delete link

Comments

Thank you for the slight adaptation and clarification :) However can you explain how the hell we always start with a HR of 1 if that is not a criteria that is met at the start? It is impossible that when training cascades with more than 1000 weak classifiers, that still every single feature a stage starts with still yields 100%HR on the positive set? It seems weird to me...

StevenPuttemans ( 2015-02-10 02:30:42 -0600 )edit

In theory, the training does not always have to start with a HR of 1 (even though that might be the case in practice). And here is why: (instead of using the Gini index, I use the misclassification measure, because it's easier to understand) You start with training each weak classifier in the feature pool, so it yields a low misclassification rate (it does not try to yield a high hit rate). In case of decision stumps it searches for a threshold which separates the classes the best. After this step, the AdaBoost algorithm selects the "best" weak classifier (with the lowest misclassification). Usually (since those are "weak" classifiers) they might just be a bit better than 50% but most probably not in the high 90% of correct classification....

_Martin ( 2015-02-10 04:28:46 -0600 )edit

At the next step, the OpenCV cascade algorithm sets a STAGE threshold which guarantees the desired hit rate. So in case there is no single feature which already separates the samples in the way that already almost all positives are classified correctly (which should almost always be the case), you will always get a "stage HR" of 1 after learning the first weak classifier

_Martin ( 2015-02-10 04:31:02 -0600 )edit

Okay I get your point but thats like you say, in theory. I would like to know if in practice openCV still forces the HR someway. Because that is quite particular behaviour. So at each iteration of a stage all the weak classifiers are recalculated based on the recalculated weights of the samples?

StevenPuttemans ( 2015-02-10 04:31:40 -0600 )edit

I guess you haven't read my second comment?

_Martin ( 2015-02-10 05:11:44 -0600 )edit

O nope, didnt see that one yet :) and that stage threshold is adapted each time a new weak classifier is added to the stage of the cascade?

StevenPuttemans ( 2015-02-10 05:41:13 -0600 )edit

Exactly :)

_Martin ( 2015-02-10 05:49:28 -0600 )edit

Thanks for the extra info, again something I can change in my notes!

StevenPuttemans ( 2015-02-10 06:25:28 -0600 )edit

add a comment

Why does opencv_traincascade not ignore nodes with a false alarm rate of 1

Comments

2 answers

Comments

Comments

Links

Question Tools

Stats

Related questions

Why does opencv_traincascade not ignore nodes with a false alarm rate of 1 edit

Comments

2 answers

Comments

Comments

Links

Question Tools

Stats

Related questions

Why does opencv_traincascade not ignore nodes with a false alarm rate of 1