unnecessary feature learned in traincascade?
Hey,
I have a weird result using traincascade which I can't explained. I created a small set of dummy data just to get to understand what traincascade does.
I get the following results:
===== TRAINING 0-stage =====
<BEGIN
POS count : consumed 17 : 17
NEG count : acceptanceRatio 35 : 1
Precalculation time: 1
+----+---------+---------+
| N | HR | FA |
+----+---------+---------+
| 1| 1| 0.428571|
+----+---------+---------+
| 2| 1| 0.428571|
+----+---------+---------+
| 3| 1| 0.142857|
+----+---------+---------+
| 4| 1| 0|
+----+---------+---------+
END>
and the created xml:
<stages>
<!-- stage 0 -->
<_>
<maxWeakCount>4</maxWeakCount>
<stageThreshold>2.4513483047485352e+00</stageThreshold>
<weakClassifiers>
<_>
<internalNodes>
0 -1 0 744.</internalNodes>
<leafValues>
9.0286773443222046e-01 -9.0286773443222046e-01</leafValues></_>
<_>
<internalNodes>
0 -1 1 -1709.</internalNodes>
<leafValues>
-1.2098379135131836e+00 -1.2098379135131836e+00</leafValues></_>
<_>
<internalNodes>
0 -1 1 -1709.</internalNodes>
<leafValues>
-1.4120784997940063e+00 1.4120784997940063e+00</leafValues></_>
<_>
<internalNodes>
0 -1 2 3.5550000000000000e+02</internalNodes>
<leafValues>
-1.3462400436401367e+00 1.3462400436401367e+00</leafValues></_></weakClassifiers></_>
</stages>
From the first output, I'd say that the weak classifier #2 does not lead to better results. If you take a look at the learned decision stump in the xml output you see the following leafValues
-1.2098379135131836e+00 -1.2098379135131836e+00
which are exactly the same? So how does this weak classifier help in the classification task? I cannot explained what happens in learning here.
Testing the classifier by detecting in a random image leads to the exact same results, no matter if I use this weak classifier or not.
Can somebody explain this behavior?
What do you mean by dummy data? In very simplistic terms, what traincascade does is to check the data you provide for the positive samples and try to find a pattern in them so it can look for that pattern in future unknown data.
If you provide totally random data (and responses to that data), it is only natural that traincasade wont be able to derive any kind of coherent model.
Ok, it seems like I used the wrong term "dummy", maybe "synthetic" would be the better choice. What I did was creating a pattern which I would like to detect. I overlayed that pattern on some background and therefore have a pretty good idea what the features could/should look like.
My main question is: how can it be that the both leaf values of a learned decision stump are equal? And how can it be that the same feature (with the same threshold) leads afterwards to different leaf values? Isn't the former weak classifier than redundant?
In fact, decision stump number 2 doesn't seem to be contributing to the classifier at all, but I have no idea why.
I'm not quite sure to have the answer, but I think I found something which might have something to do with it: in the above case, I used the Discrete Adaboost algorithm. And I do not get decision stump number 2 in case of Real Adaboost. So I'm guessing, since DAB does only classification {-1,1} and RAB yields to real numbers, only after reweighting the given samples, DAB yields to the weak classifier number 3.
Might that be it?