How to train classifiers the best way possible
Hello,
i am trying to train a cascade classifier with opencv_traincascade. There are a few tutorials and posts about that, but all of them seem to be either outated or are contradictory. Also i am interested in doing things the best way possible, not just getting it to work. I am going to tell you what i learned so far, and what is unclear to me.
Acquiring positive samples
There are two methods:
Using a single (or few) images and use OpenCV's createsamples utility to generate a lot of samples from them. In case of more then one image, the resulting *.vec files are merged with a special utility.
Using multiple annoted images and feeding them to createsamples with the -info option. This does not create more samples, it just extracts them.
The first approach seems to be used by a lot of people, mainly because you dont need annoted images and can create a large number of samples from just a few images. But I've read that the first approach is really bad and one should use the seccond approach because it provides real world samples that are more likely to match the input images in a real application. Some people say that taking a smaller number of real world images is way better then using a really large number of generated artificial samples.
I currently have about 300 annoted samples at hand. I could directly feed them to createsamples (with -info).
- 300 samples are not a huge amount, but those are "real world" samples. Referring to the above statements i might not need more. By how much are real world samples actually better then generated ones? Are 300 samples enough?
- Otherwise would it make sense to create about 10 artificial samples per real world sample using createsamples and merge those *.vec files? This way i would have 3000 samples.
Acquiring negative samples
Most people use random images from the internet as negatives to train. But i read throught this Article which is suggesting a better approach:
Creating negatives from the backgrounds of the positives is much more “natural” and will give far better results, than using a wild list of background images taken from the Internet.
Also:
I would avoid leaving OpenCV training algorithm create all the negative windows. To do that, create the background images at the final training size so that it cannot subsample but only take the entire negative image as a negative.
If i understand this right, the author suggests to take the images containing the object and extract negative samples from the regions around the object, also making sure those extracted negatives have the same size as the training window.
Does that make sense? I looked at his git repository and his negatives where a lot larger then the training window. Do i miss something?
Also, should i extract regions with the window size out of the images, or could i use larger portions with the same aspect ratio and scale ...
Maybe @StevenPuttemans can evaluate your parameters, he has more experience with traincascade.
@Pedro Batista, thank you for calling me to this topic :) About the parameters, I will take a look and add more info below your answer if needed!