# opencv_createsamples correct parameters

Hello, I am training a classifier for flower from the 102 flower category dataset. I followed the CodingRobin tut for eeverything,
http://coding-robin.de/2013/07/22/tra...

My first attempt at the training failed at the 0 stage disappointingly. Anyway, this time I want to tune the training parameters beforehand properly.
So, the training guide says to have -bgcolor 0 -bgthresh 0 . Can anyone take a look at few of my training samples and tell me what value should I choose or How do I find what value to choose? Is -bgcolor the value of the grayscale or actual color?
Also, I have 250 positives and 8188 negatives. I plan to use opencv_createsamples to produce 10,000 images(40 for each positive I guess). Should I go for 10,000 positives(or more or less)?

edit retag close merge delete

Sort by » oldest newest most voted

I would consider making the ratio of positive and negatives 1:2. So if you have 8188 negative take approximately 4000 positive samples. Also you should consider this if you don't want your traincascade crashes. Some already used positive samples can be filtered by each previous stage (i.e. recognized as background) so if you put maximum of your positive sample and traincascade rejects some of them it can cause a crash. And also i think you have a lot of background in your positive samples, you should consider cropping the flowers from some of them. But it's just my opinion :)

more

1

I took your(indirectly "their" in this) advice and now by "that" formula. I have reduced the -numPos to1521. I hope it works out. With, -numPos 10000 the training failed after the zeroth stage saying error message like, "unable to get more positive samples....insufficient..."

( 2015-04-24 02:28:20 -0500 )edit
1

Also, I have made -bgcolor 50 -bgthresh 50 because I examined the area around the flower and noted the range of gray pixels. I wonder it would work!

( 2015-04-24 02:30:19 -0500 )edit
1

@pulp_fiction, the first error with -numPos 10000 is normal, since you simply do not have that many samples in your vec file. The formula of maria is the extreme case, I am sure your data will do fine at the 1:2 rate!

( 2015-04-24 06:07:21 -0500 )edit

@StevenPuttemans : Oops! I read your comment just now and I had applied Maria's formula back then, started training with 1521 positives. Since then, 4 stages have passed, and in each stage it has consumed ~1550 positives. Is it a blunder I have done? Should I restart the training? Also, the acceptance ratio right now is 0.0500841. What would be a good stopping point?

( 2015-04-24 09:39:02 -0500 )edit

No you should not restart and first see how well this performs. But as you can see, positives are reused, unless they are discarded and replaced. In the worst case all positives get discarded, but that happens about never. Normally you should see 1500:1540 for example at the positives. That means that already 40 positives have been rejected to avoid overfitting of the data. As to the acceptanceRatio, I always train to a level of 10e-5 but not everyone agrees with me on that. I think that is the point where further training would lead to overfitting of the model.

( 2015-04-25 03:24:37 -0500 )edit

Official site

GitHub

Wiki

Documentation