Traincascade Error: Bad argument (Can not get new positive sample. The most possible reason is insufficient count of samples in given vec-file.

Hi All: Here: http://code.opencv.org/issues/1834 Maria Dimashova She is giving the formula : vec-file has to contain >= (numPose + (numStages-1) * (1 - minHitRate) * numPose) + S, where S is a count of samples from vec-file . First I would like to know where to find this formula, in which document . Second She is writing :Bug "It was fixed in r8913", What daes it mean r8913? Is it Opencv actual version 2.4.3 ? Thank You.

edit retag close merge delete

( 2013-04-10 03:50:02 -0500 )edit

Sort by » oldest newest most voted

Hi,

First of all, I have to note that you copied my formula description incompletely. I wrote at that issue: "S is a count of samples from vec-file that can be recognized as background right away". With the partial description of S from the question, the formula does not make sense at all :)

For the document you asked.. I don't remember that I wrote this formula anywhere except the issue. The formula is not from any paper of course, it just follows from how traincascade application selects a set of positive samples to train each stage of a cascade classifier. Ok, I'll describe my formula in more details as you ask.

numPose - a count of positive samples which is used to train each stage (do not confuse it with a count of all samples in vec-file!).

numStages - a stages count which a cascade classifier will have after the training.

minHitRate - training constraint for each stage which means the following. Suppose a positive samples subset of size numPose was selected to train current i-stage (i is a zero-based index). After the training of current stage, at least minHitRate * numPose samples from this subset have to pass this stage, i.e. current cascade classifier with i+1 stages has to recognize this part (minHitRate) of the selected samples as positive.

If some positive samples (falseNegativeCount pieces) from the set (of size numPose) which was used to train i-stage were recognized as negative (i.e. background) after the stage training, then numPose - falseNegativeCount pieces of correctly recognized positive samples will be kept to train i+1-stage and falseNegativeCount pices of new positive samples (unused before) will be selected from vec-file to get a set of size numPose again.

One more important note: to train next i+1-stage we select only the samples that are passed a current cascade classifier with i+1 stages.

Now we are ready to derive the formula. For the 0-stage training we just get numPose positive samples from vec-file. In the worse case (1 - minHitRate) * numPose of these samples are recognized as negative by the cascade with 0-stage only. So in this case to get a training set of positive samples for the 1-stage training we have to select (1 - minHitRate) * numPose new samples from vec-file that are recognized as positive by the cascade with 0-stage only. While the selection, some new positive samples from vec-file can be recognized as background right away by the current cascade and we skip such samples. The count of skipped sample depends on your vec-file (how different samples are in it) and other training parameters. By analogy, for each i-stage training (i=1,..numStages-1) in the worse case we have to select (1 - minHitRate) * numPose new positive samples and several positive samples will be skipped in the process. As result to train all stages we need numPose + (numStages - 1) * (1 - minHitRate) * numPose + S positive samples, where S is a count of all the skipped samples from ...

more

Thank you for your clearer explanation! But I'm a bit confused about the "-numPos" argument. I would like to know if the current fix allow to use the total number of positive samples for each stage, in other words, in the argument "-numPos" I can use the total number? Or I still need to use your formula?

( 2012-11-23 04:39:21 -0500 )edit
1

Yes, you still need to keep in mind this formula. That fix was only about to throw an exception with error message for a user if there are not enough positive samples for the next stage training, because there was an assertion in that point of the code before.

( 2012-11-23 05:50:08 -0500 )edit
1

And why don't you put that formula and the result of that formula in the thrown message ... ?

Would save a lot of time to a lot of people

( 2014-02-11 07:17:27 -0500 )edit

Thank Maria for a great explanation

( 2016-12-14 01:34:52 -0500 )edit

I guess moderators will recommend you to open new question and not to ask more questions in the answer on your own question :)

Here it's said that -numPos is used in training for every classifier stage. OK, maybe we really have to specify that -numPos != vec-file samples count at that guide. Please, open the issue here with the link to this question.

0.9999999... is not a good value of minHitRate at least due to it will result in complicated classifiers even at first stages. And this breaks the idea of a cascade classifier to have weak classifiers at the beginning for rejecting a huge amount of background rectangles by cheaper checks of the first stages.

S depends on vec-file samples properties but you can also try to estimate this value. Suppose that the samples in vec-file have the equal probabilities to be rejected by a given cascade (be recognized as background). Of couse it's not true in reality but it's sutable for the estimation of vec-file size. In uniform case when you try to select falseNegativeCount positive samples to train i-stage you will select every new sample from vec-file with probability minHitRate^i. So to select falseNegativeCount samples you will try in average falseNegativeCount / minHitRate^i samples from vec-file. The increasing factor is small. E.g. if i=1, numPos=1000, minHitRate=0.99, falseNegativeCount=10 then you need to try in average ~10.1 samples from vec-file. More detailed formula about vec-file size with this assumption you can derive easy.

About wasting hours of work.. traincascade stores each trained stage immediately, so if you get an exception you can try to start training from current stage but with another training parameters and maybe with more rich vec-file.

I also don't recommend you to downgrade to 2.2. I did not find in the Git history my commit (due to files reorganization), but I fixed the following problem of traincascade: when traincascade tries samples from vec-file one by one and reaches the end of the file it have to finish the training, otherwise it will use duplicate samples. This was the bug.

The answer on the question about 2000 positives. To be sure that you can train a good cascade, try to use traincascade with default parameters on well-tried vec-file. Maybe you should start to play with parameters on this vec-file (not your) and definitely with LBP features (LBPs decrease wasting the time). For the choosing numPos, I think you can follow something like this numPos=0.9xNum_in_vec and you can also get more accurate estimation of this coefficient (instead of 0.9) in the uniform case (it's easy).

About tips on studing the traincascade code. As usual, from top to more details.. Here classical Haar is the best feature to get understanding faster (especially where features are processed by ADABoost because Haar is ordered (not categorical)). For an optimization the integral images are intensively used in cascades (keep it in mind). Don ...

more

Hi, first I do congratulate of posts, is really good, and help-me very much.

But, I have some questions, I'm try use opencv_traincascade with 100 positive images and 600 negative images, using createsamples I do 100 samples create so, I pass following parametters:

opencv_traincascade.exe -data data/cascade -vec data/vector.vec -bg negative/infofile.txt -numPos 5 -numNeg 100 -numStages 15 -precalcValBufSize 256 -precalcIdxBufSize 256 -featureType HAAR -mode all -w 30 -h 30 -minHitRate 0.90

but, a aplication just stop of execute and close, don't have anything message of error, I do some variations of -numPos , but nevertheless a error return, somebody help-me ?

thanks...

more

numPos should be ur sample count.

( 2015-02-03 06:50:00 -0500 )edit

Hi Maria:

Thank You so much for Your long and exhaustive reply. I did read it carefully as it deserve a proper attention and concentration. Still I have my doubts, not in the formula itself as that is very clear to me now, but in the methodology . First the fact that –numPos has to be different from the number of sample in the .vec file is not explained in any official documentation like opencv_user.pdf chapter four , and anybody who read, get the message they are the same (we need to update the documentation).

Second, more important, the formula as You say, is not giving any systematic and final way to calculate (given an already existing .vec file from a good dataset of positive), the –numPos value, because we cannot know in advance the value of S in the formula. What we can know is our setting of minHitrate, so if we set minHitRate equal to 0.9999999… it seems we never will get any error as in this way the falseNegativeCount pieces will always be less than one. Another possibility is the already mention one of setting numPos=(0,9 x num_in_vec) or (0,8 x num_in_vec), but that it look to me also a kind of trick without guaranty of success. The big problem here, as You know is the extremely LONG computational time for every stages , and to have the process crashed after few stages means to waste hours of work, and not to have any guaranty that new setting will lead to the “desideratum” final stage, without crashing another time.

What I did, is to utilize OpenCV 2.2 (I downgraded!) where you can set numPos=num_in_vec without problem and get your xml classifiers (somehow). 2.2 version (as an example) at every stages it consumes few and few positive, discharging the FalseNegativeCount pieces that You mention.

But now I ask Your Kind suggestion about what is better to do, after creating a good dataset of lets say 2000 positive, and getting the .vec file what should we do in order not-to-crash the process? Should we set numPos=0.9xNum_in_vec, or minHitRate=0.999999, or use version 2.2 or any better suggestion?

One final suggestion is about a different matter, as I’m working on this for my final Thesis for my master, I’d like to study in deep how is working the code of traincascade with the 3 different features: Haar, LBP and HOG, so I’ll appreciate any starting suggestions and tips from You on start studing the cpp code.

Best regard.

Marco Romagnoli

more

Stats

Asked: 2012-11-20 08:36:28 -0500

Seen: 31,172 times

Last updated: Jan 19 '14