1 | initial version |
Here it's said that -numPos
is used in training for every classifier stage. OK, maybe we really have to specify that -numPos
!= vec-file samples count at that guide. Please, open the issue here with the link to this question.
0.9999999... is not a good value of minHitRate
at least due to it will result in complicated classifiers even at first stages. And this breaks the idea of a cascade classifier to have weak classifiers at the beginning for rejecting a huge amount of background rectangles by cheaper checks of the first stages.
S
depends on vec-file samples properties but you can also try to estimate this value. Suppose that the samples in vec-file have the equal probabilities to be rejected by a given cascade (be recognized as background). Of couse it's not true in reality but it's sutable for the estimation of vec-file size. In uniform case when you try to select falseNegativeCount positive samples to train i
-stage you will select every new sample from vec-file with probability minHitRate^i
. So to select falseNegativeCount
samples you will try in average falseNegativeCount / minHitRate^i
samples from vec-file. The increasing factor is small. E.g. if i=1, numPos=1000, minHitRate=0.99, falseNegativeCount=10
then you need to try in average ~10.1
samples from vec-file. More detailed formula about vec-file size with this assumption you can derive easy.
About wasting hours of work.. traincascade
stores each trained stage immediately, so if you get an exception you can try to start training from current stage but with another training parameters and maybe with more rich vec-file.
I also don't recommend you to downgrade to 2.2. I did not find in the Git history my commit (due to files reorganization), but I fixed the following problem of traincascade: when traincascade tries samples from vec-file one by one and reaches the end of the file it have to finish the training, otherwise it will use duplicate samples. This was the bug.
The answer on the question about 2000 positives. To be sure that you can train a good cascade, try to use traincascade with default parameters on well-tried vec-file. Maybe you should start to play with parameters on this vec-file (not your) and definitely with LBP features (LBPs decrease wasting the time). For the choosing numPos, I think you can follow something like this numPos=0.9xNum_in_vec
and you can also get more accurate estimation of this coefficient (instead of 0.9) in the uniform case (it's easy).
About tips on studing the traincascade code. As usual, from top to more details.. Here classical Haar is the best feature to get understanding faster (especially where features are processed by ADABoost because Haar is ordered (not categorical)). For an optimization the integral images are intensively used in cascades (keep it in mind). Don't worry about AdaBoost details at first. Of course you can ask your future detailed questions about the code.
BTW, I guess moderators will recommend you to open new question and not to ask more questions in the answer on your own question :)
2 | No.2 Revision |
I guess moderators will recommend you to open new question and not to ask more questions in the answer on your own question :)
Here it's said that -numPos
is used in training for every classifier stage. OK, maybe we really have to specify that -numPos
!= vec-file samples count at that guide. Please, open the issue here with the link to this question.
0.9999999... is not a good value of minHitRate
at least due to it will result in complicated classifiers even at first stages. And this breaks the idea of a cascade classifier to have weak classifiers at the beginning for rejecting a huge amount of background rectangles by cheaper checks of the first stages.
S
depends on vec-file samples properties but you can also try to estimate this value. Suppose that the samples in vec-file have the equal probabilities to be rejected by a given cascade (be recognized as background). Of couse it's not true in reality but it's sutable for the estimation of vec-file size. In uniform case when you try to select falseNegativeCount positive samples to train i
-stage you will select every new sample from vec-file with probability minHitRate^i
. So to select falseNegativeCount
samples you will try in average falseNegativeCount / minHitRate^i
samples from vec-file. The increasing factor is small. E.g. if i=1, numPos=1000, minHitRate=0.99, falseNegativeCount=10
then you need to try in average ~10.1
samples from vec-file. More detailed formula about vec-file size with this assumption you can derive easy.
About wasting hours of work.. traincascade
stores each trained stage immediately, so if you get an exception you can try to start training from current stage but with another training parameters and maybe with more rich vec-file.
I also don't recommend you to downgrade to 2.2. I did not find in the Git history my commit (due to files reorganization), but I fixed the following problem of traincascade: when traincascade tries samples from vec-file one by one and reaches the end of the file it have to finish the training, otherwise it will use duplicate samples. This was the bug.
The answer on the question about 2000 positives. To be sure that you can train a good cascade, try to use traincascade with default parameters on well-tried vec-file. Maybe you should start to play with parameters on this vec-file (not your) and definitely with LBP features (LBPs decrease wasting the time). For the choosing numPos, I think you can follow something like this numPos=0.9xNum_in_vec
and you can also get more accurate estimation of this coefficient (instead of 0.9) in the uniform case (it's easy).
About tips on studing the traincascade code. As usual, from top to more details.. Here classical Haar is the best feature to get understanding faster (especially where features are processed by ADABoost because Haar is ordered (not categorical)). For an optimization the integral images are intensively used in cascades (keep it in mind). Don't worry about AdaBoost details at first. Of course you can ask your future detailed questions about the code.
BTW, I guess moderators will recommend you to open new question and not to ask more questions in the answer on your own question :)