1 | initial version |
Hi,
First of all, I have to note that you copied my formula description incompletely. I wrote at that issue: "S
is a count of samples from vec-file that can be recognized as background right away". With the partial description of S
from the question, the formula does not make sense at all :)
For the document you asked.. I don't remember that I wrote this formula anywhere except the issue. The formula is not from any paper of course, it just follows from how traincascade application selects a set of positive samples to train each stage of a cascade classifier. Ok, I'll describe my formula in more details as you ask.
numPose
- a count of positive samples which is used to train each stage (do not confuse it with a count of all samples in vec-file!).
numStages
- a stages count which a cascade classifier will have after the training.
minHitRate
- training constraint for each stage which means the following. Suppose a positive samples subset of size numPose
was selected to train current i
-stage (i
is a zero-based index). After the training of current stage, at least minHitRate * numPose
samples from this subset have to pass this stage, i.e. current cascade classifier with i+1
stages has to recognize this part (minHitRate
) of the selected samples as positive.
If some positive samples (falseNegativeCount
pieces) from the set (of size numPose
) which was used to train i
-stage were recognized as negative (i.e. background) after the stage training, then numPose - falseNegativeCount
pieces of correctly recognized positive samples will be kept to train i+1
-stage and falseNegativeCount
pices of new positive samples (unused before) will be selected from vec-file to get a set of size numPose again.
One more important note: to train next i+1
-stage we select only the samples that are passed a current cascade classifier with i+1
stages.
Now we are ready to derive the formula. For the 0-stage training we just get numPose positive samples from vec-file. In the worse case (1 - minHitRate) * numPose
of these samples are recognized as negative by the cascade with 0-stage only. So in this case to get a training set of positive samples for the 1-stage training we have to select (1 - minHitRate) * numPose
new samples from vec-file that are recognized as positive by the cascade with 0-stage only. While the selection, some new positive samples from vec-file can be recognized as background right away by the current cascade and we skip such samples. The count of skipped sample depends on your vec-file (how different samples are in it) and other training parameters. By analogy, for each i
-stage training (i=1,..numStages-1
) in the worse case we have to select (1 - minHitRate) * numPose
new positive samples and several positive samples will be skipped in the process. As result to train all stages we need numPose + (numStages - 1) * (1 - minHitRate) * numPose + S
positive samples, where S
is a count of all the skipped samples from vec-file (for all stages).
Of course this formula does not allow to know an exact size of vec-file (samples count) we must have, because S depends on vec-file samples quality. But the formula gives an estimation of this size and understanding its reason. I hope it is so :) If you disagree or have more questions, please let me know.
r8913 was a link to the SVN revision 8913, but we migrated on GIT. The corresponding fix is in OpenCV >= 2.4.2. And yes, 2.4.3 is actual version (you can always check this here).