- Your questions #1 and #3 are related to each other. Resolution of positive samples should be just larger than w and h parameters during training. All your positives are simply resized to this resolution. "Good" positive doesn't mean good resolution. In fact you have to position every sample carefully. I think you know that traincascade is going to calculate some features, and this features have position within patch. That means that features of objects should be located in the same relative position within sample. For instance, for faces you can put a nose to the center. This is especially important for profile faces, because faces aligned to the right and to the left are different objects. To be more specific, they are the same shifted object which you can find with sliding window. So, you have to think on how you should crop your positive images. They are usually should be centered in uniform way. On other hand you have to allow some variation in the training set. So, the nose should float around the center, and faces should be rotated a little.
- First of all you have to use all available negatives. And the idea to use other objects is good. Important aspect is to use many images with natural background for your objects. Human faces can be seen everywhere, but you should definitely have in-door images. For animals you need forests, deserts, etc. But do not limit yourself to natural backgrounds only.
- The size is somehow related to the size of important features of your object. My opinion is that you should choose the smallest size while preserving all the important gradients of the object. If you use too large size, you use redundant information, this is like overfitting. For faces you need gradient between eyes and forehead, but the skin between eyes and eyebrows is usually not important. I think you understand what I mean. You should also keep in mind that your detector will not work on objects which are smaller than the discussed size, this is second reason why to keep it small.
- Your statement about
groupRectangles
is generally not true. With reasonable eps
the function groups only overlapped rectangles, and the algorithm is still able to return multiple objects. It is usually quite simple to find proper eps
for your particular case.
So, the real secret is in getting and cropping your positives. For non-rigid objects you need thousands of them, and find a compromise between uniform positioning and adding some variance. For negatives use as much images as you have, but you may need a couple of thousands.