1 | initial version |
Basically what you are doing wrong is using the createsamples utility to introduce transformations. This will work in cleaner and lab environments but it creates unrealistic features for the object in real life situations. You should start by removing that part and gather positive samples the hard way, by retrieving thousands of original training images with as much variation as possible captured in the samples.
You will also need to introduce a lot more negatives. You want to detect an object instance in a random situation, where the variation in possible backgrounds is huge. You have to think about it in the following matter. You have to try to include every possible background as a negative. This means that many object models for out in the wild detection come with huge sets of negatives, think in the number of multiple 100'000s. However the advantage is that you can provide a wild range of negatives, that are larger then your model, and the algoritm will take random samples from that images. That means that 5000 images of large resolution could easily get you 150.000 windows to train as negatives.
Aside from that, parameter tweaking is always one of the long taking processes. Each application will have its own set of specific settings to get the best result. This is a long period of trial and error I am afraid, which could be partially automated.
Also, use LBP until you reach a somewhat decent model. The training is a tenfold faster and detection is faster also, mainly due to using integer operations in both steps. It will get your models with alot of samples trained in days rather then in weeks with HAAR features.
About the bonus questions: