Revision history [back]

How many negatives and positive images should I have?

It all depends on your application so start by clearly defining what application area you have
Things to consider in this case are the amount of variation in object instances, the fact if you have a known/static background or not, is your lighting controlled or not, ...
But a general rule that works for a variety of setups is to take a numPos:numNeg ration of 1:2.
Keep in mind that increase the number of positives will increase the generalization of your model, it will look for better general features and will less likely overfit on your training data. Increasing the number of negative images is needed to remove the large amount of false positive detections. You want to supply as many situations as possible that don't look like your object here! Generally speaking they supply thousand of samples here to try to model general background noise for your application.
Always keep in mind to carefully select your training data. It is better to 100 good natural occuring samples, than to use 1 good image and transform it 100 times with the available tools.

Should positives be more than negatives? If yes what is the best proportion between negatives and positives?

Like said above that kind of depens on your application. A ratio of 1:2 is good but I have applications where there are ratios of 100:1 and applications where the ratio is 1:100. It all makes sense once you know what training does for your application.

Is there a preferable format for the pictures (bmp, jpg, png etc)?

Actually there is not a general rule for this, but I always suggest users to use a file format that doesn't have lossy compression, like png. This will ensure that you do not incorporate compression artefacts as actual features of the object. This is especially the case when resizing your training data.

What should be the size of negative pictures and what should be the size of positive images?

Lets say my negative images are 640x320, and the "to be detected" object is 100x50. In negatives folder the images should all be 640x320?In positives folder should be 640x320 cropped images with visible on the object?Or should i place in positives folder images of 100x50 with the object only?

In your positives folder you keep images that contain objects. Then your positives.txt file will be formatted image_location number_objects x1 y1 w1 h1 x2 y2 w2 h2 ... xN yN wN hN. This means that those regions will be cut out by the create samples tool and then resized to the model dimensions given at the -h and -w parameters of the tool. For your negative images just supply a folder with tons of images that are larger than your model size your selected. During training negative windows will get sampled from those larger images. At training time the -numNeg windows can thus be quite larger than the actual number of images.

Cropping positives images should be clear everything from background? Or should I use just rectangle around object, including some of the surrounding background?

You need to include application specific background information in order to get a good performing detector. This means you gather positives in the setup where eventually your application needs to work. An object detector that is invariant to the application will litteraly require to collect thousands, up to millions of training data. This is for example the case in state-of-the-art pedestrian detectors.

I tried to use the "famous" imageclipper program, with no luck. Does anyone done it? Is there any walk through tutorial to install this program?

Imageclipper is not an official part of OpenCV. Though no documentation exists for it yet (still working on that), latest 2.4 branch and 3.0 branch contain an OpenCV specific annotation tool that can be used to make your annotations. Take a look! The tool is automatically built when building all tools using CMAKE. It can then be used in a command line interface using the opencv_annotation command.

Opencv_createsamples: Is it necessary?

YES it is needed. Unless you want to manually create the data vectors that are generated by it. The complete OpenCV implementation depend on those vectorized training data structures to be succesful.

How many samples should I use? About -w and -h ... ? This is going to affect the training and finally the detection procedure?

Like said before, -w and -h are the model dimensions, to which each new window is resized before training and which sizes are used to grab negative windows. No you do not need to manually resize everything. Only think you should guarantee is that the w:h ratio of your objects is about the same as the w:h ratio of the model dimensions.

Effect on the training procedure

The larger the dimensions the more features can be calculated and the more weak classifiers will be gathered for the boosting process.
This also influences the amount of memory needed by the application, since all weak classifiers are stored in RAM memory during the process. If you application crashes due to not enough memory, you will need to reduce your model size.

Effect on the detection procedure

The dimensions specify the smallest object size you will be able to detect.
Objects larger than that will be detected by the multiscale image pyramid approach.
Smaller objects will be ignored since upscaling images to be able to detect them introduces way to much clutter and rescaling artefacts.

opencv_traincascade: Below are all the parameters:

This I will not do, all the explanations of these parameters are here. Specifics on each parameter are endless ... so basically it depends on what you want. Reading the Viola and Jones paper on the framework will also clear a lot of things up.

During training i am getting this ... Can anyone explain that table and all the other information?

===== TRAINING 0-stage =====
POS count : consumed 400 : 400 
NEG count : acceptanceRatio 1444 : 1 
Precalculation time: 12 
+----+---------+---------+ | N | HR | FA | 
+----+---------+---------+ | 1| 1| 1| 
+----+---------+---------+ | 2| 1| 1| 
+----+---------+---------+ | 3| 1| 0.454986| 
+----+---------+---------+

Training until now has taken 0 days 0 hours 20 minutes 11 seconds.

Basically that is your training input. Each stage is a combination of weak classifiers until you reach the desired accuracy for your stages defined by the maximum false alarm rate and the hit rate. What you see for each stage is the samples grabbed, the accuracy on the classification of the negatives, and then for each weak classifier its influence on adding this to the cascade. Again to understand this, please read the paper!

After training. I have trained my classifier for 5 stages, and was able to find some objects on image (with a lot of mistakes, offcourse), then I trained it for 8 stages (with no improvement), then I trained it for 12 stages and for some weird reason it cant find any object on image, nothing.

First guess (altough there can be many reasons) is that you have a small set of training data. What your models say is

For 5 stage model you are finding objects, so your positives are descriptive enough for the object, but you have to many false positive detections. Add more negatives to make your model more descriptive from background info!
For 8 stage model, if no improvement than this means that your training set has reached it limits. Altough no difference is quite impossible. Probably less false positives? In this case more training is needed.
For 12 stage model, you pushed your boundaries of the training data, asked to train further and now it overfitted on your training data. It will only detect those samples and nothing else anymore. This is very bad and you should discard this model.

So basically more trainingsdata is needed!