1 | initial version |
How many negatives and positive images should I have?
Should positives be more than negatives? If yes what is the best proportion between negatives and positives?
Like said above that kind of depens on your application. A ratio of 1:2 is good but I have applications where there are ratios of 100:1 and applications where the ratio is 1:100. It all makes sense once you know what training does for your application.
Is there a preferable format for the pictures (bmp, jpg, png etc)?
Actually there is not a general rule for this, but I always suggest users to use a file format that doesn't have lossy compression, like png. This will ensure that you do not incorporate compression artefacts as actual features of the object. This is especially the case when resizing your training data.
What should be the size of negative pictures and what should be the size of positive images?
Lets say my negative images are 640x320, and the "to be detected" object is 100x50. In negatives folder the images should all be 640x320?In positives folder should be 640x320 cropped images with visible on the object?Or should i place in positives folder images of 100x50 with the object only?
In your positives folder you keep images that contain objects. Then your positives.txt file will be formatted image_location number_objects x1 y1 w1 h1 x2 y2 w2 h2 ... xN yN wN hN
. This means that those regions will be cut out by the create samples tool and then resized to the model dimensions given at the -h
and -w
parameters of the tool. For your negative images just supply a folder with tons of images that are larger than your model size your selected. During training negative windows will get sampled from those larger images. At training time the -numNeg
windows can thus be quite larger than the actual number of images.
Cropping positives images should be clear everything from background? Or should I use just rectangle around object, including some of the surrounding background?
You need to include application specific background information in order to get a good performing detector. This means you gather positives in the setup where eventually your application needs to work. An object detector that is invariant to the application will litteraly require to collect thousands, up to millions of training data. This is for example the case in state-of-the-art pedestrian detectors.
I tried to use the "famous" imageclipper program, with no luck. Does anyone done it? Is there any walk through tutorial to install this program?
Imageclipper is not an official part of OpenCV. Though no documentation exists for it yet (still working on that), latest 2.4 branch and 3.0 branch contain an OpenCV specific annotation tool that can be used to make your annotations. Take a look! The tool is automatically built when building all tools using CMAKE. It can then be used in a command line interface using the opencv_annotation
command.
Opencv_createsamples: Is it necessary?
YES it is needed. Unless you want to manually create the data vectors that are generated by it. The complete OpenCV implementation depend on those vectorized training data structures to be succesful.
How many samples should I use? About -w and -h ... ? This is going to affect the training and finally the detection procedure?
Like said before, -w
and -h
are the model dimensions, to which each new window is resized before training and which sizes are used to grab negative windows. No you do not need to manually resize everything. Only think you should guarantee is that the w:h ratio of your objects is about the same as the w:h ratio of the model dimensions.
Effect on the training procedure
Effect on the detection procedure
opencv_traincascade: Below are all the parameters:
This I will not do, all the explanations of these parameters are here. Specifics on each parameter are endless ... so basically it depends on what you want. Reading the Viola and Jones paper on the framework will also clear a lot of things up.
During training i am getting this ... Can anyone explain that table and all the other information?
===== TRAINING 0-stage =====
POS count : consumed 400 : 400
NEG count : acceptanceRatio 1444 : 1
Precalculation time: 12
+----+---------+---------+ | N | HR | FA |
+----+---------+---------+ | 1| 1| 1|
+----+---------+---------+ | 2| 1| 1|
+----+---------+---------+ | 3| 1| 0.454986|
+----+---------+---------+
Training until now has taken 0 days 0 hours 20 minutes 11 seconds.
Basically that is your training input. Each stage is a combination of weak classifiers until you reach the desired accuracy for your stages defined by the maximum false alarm rate and the hit rate. What you see for each stage is the samples grabbed, the accuracy on the classification of the negatives, and then for each weak classifier its influence on adding this to the cascade. Again to understand this, please read the paper!
After training. I have trained my classifier for 5 stages, and was able to find some objects on image (with a lot of mistakes, offcourse), then I trained it for 8 stages (with no improvement), then I trained it for 12 stages and for some weird reason it cant find any object on image, nothing.
First guess (altough there can be many reasons) is that you have a small set of training data. What your models say is
So basically more trainingsdata is needed!