Revision history [back]

Hi,

First of all, I have to note that you copied my formula description incompletely. I wrote at that issue: "S is a count of samples from vec-file that can be recognized as background right away". With the partial description of S from the question, the formula does not make sense at all :)

For the document you asked.. I don't remember that I wrote this formula anywhere except the issue. The formula is not from any paper of course, it just follows from how traincascade application selects a set of positive samples to train each stage of a cascade classifier. Ok, I'll describe my formula in more details as you ask.

numPose - a count of positive samples which is used to train each stage (do not confuse it with a count of all samples in vec-file!).

numStages - a stages count which a cascade classifier will have after the training.

minHitRate - training constraint for each stage which means the following. Suppose a positive samples subset of size numPose was selected to train current i-stage (i is a zero-based index). After the training of current stage, at least minHitRate * numPose samples from this subset have to pass this stage, i.e. current cascade classifier with i+1 stages has to recognize this part (minHitRate) of the selected samples as positive.

If some positive samples (falseNegativeCount pieces) from the set (of size numPose) which was used to train i-stage were recognized as negative (i.e. background) after the stage training, then numPose - falseNegativeCount pieces of correctly recognized positive samples will be kept to train i+1-stage and falseNegativeCount pices of new positive samples (unused before) will be selected from vec-file to get a set of size numPose again.

One more important note: to train next i+1-stage we select only the samples that are passed a current cascade classifier with i+1 stages.

Now we are ready to derive the formula. For the 0-stage training we just get numPose positive samples from vec-file. In the worse case (1 - minHitRate) * numPose of these samples are recognized as negative by the cascade with 0-stage only. So in this case to get a training set of positive samples for the 1-stage training we have to select (1 - minHitRate) * numPose new samples from vec-file that are recognized as positive by the cascade with 0-stage only. While the selection, some new positive samples from vec-file can be recognized as background right away by the current cascade and we skip such samples. The count of skipped sample depends on your vec-file (how different samples are in it) and other training parameters. By analogy, for each i-stage training (i=1,..numStages-1) in the worse case we have to select (1 - minHitRate) * numPose new positive samples and several positive samples will be skipped in the process. As result to train all stages we need numPose + (numStages - 1) * (1 - minHitRate) * numPose + S positive samples, where S is a count of all the skipped samples from vec-file (for all stages).

Of course this formula does not allow to know an exact size of vec-file (samples count) we must have, because S depends on vec-file samples quality. But the formula gives an estimation of this size and understanding its reason. I hope it is so :) If you disagree or have more questions, please let me know.

r8913 was a link to the SVN revision 8913, but we migrated on GIT. The corresponding fix is in OpenCV >= 2.4.2. And yes, 2.4.3 is actual version (you can always check this here).