Ask Your Question

Improve Object Detection Quality

asked 2012-08-02 08:50:43 -0600

Bia6969 gravatar image

updated 2012-08-04 02:14:04 -0600

Kirill Kornyakov gravatar image

I have a few basic questions regarding the object detection available on OpenCV.

  1. I is said all over the internet that a good positive folder is a huge step of the traincascade process: good images may lead to good results. My question is: is there an optimal resolution this positive images should have?

  2. About the negative folder, most times people say "random images not containing the object you want to recognize". Well, should i put images of other trained objects? Or should i put ridiculous images like forests and stuff? I'm a little lost regarding the contents of this folder..

  3. When i create the vector, what importance has the size you pick? The bigger w and h parameters, the better? I doesn't really matter? I always put 32, but truth is i've tried with -w 50 -h 50 and it gave out a better result.

  4. About groupRectangles(..) : what should the second and third arguments be? I guess if i use this function to try to reduce some redundant rectangles, i will DISABLE the possibility of identifying two of the same object on the same picture, right?

edit retag flag offensive close merge delete



I know someone who spent half an year, full time, just optimizing all those parameters for a given detection task, by trial and error.

sammy gravatar imagesammy ( 2012-08-02 09:05:40 -0600 )edit

Well, that's motivating ;)

Bia6969 gravatar imageBia6969 ( 2012-08-02 10:33:25 -0600 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2012-08-04 02:11:26 -0600

Kirill Kornyakov gravatar image
  1. Your questions #1 and #3 are related to each other. Resolution of positive samples should be just larger than w and h parameters during training. All your positives are simply resized to this resolution. "Good" positive doesn't mean good resolution. In fact you have to position every sample carefully. I think you know that traincascade is going to calculate some features, and this features have position within patch. That means that features of objects should be located in the same relative position within sample. For instance, for faces you can put a nose to the center. This is especially important for profile faces, because faces aligned to the right and to the left are different objects. To be more specific, they are the same shifted object which you can find with sliding window. So, you have to think on how you should crop your positive images. They are usually should be centered in uniform way. On other hand you have to allow some variation in the training set. So, the nose should float around the center, and faces should be rotated a little.
  2. First of all you have to use all available negatives. And the idea to use other objects is good. Important aspect is to use many images with natural background for your objects. Human faces can be seen everywhere, but you should definitely have in-door images. For animals you need forests, deserts, etc. But do not limit yourself to natural backgrounds only.
  3. The size is somehow related to the size of important features of your object. My opinion is that you should choose the smallest size while preserving all the important gradients of the object. If you use too large size, you use redundant information, this is like overfitting. For faces you need gradient between eyes and forehead, but the skin between eyes and eyebrows is usually not important. I think you understand what I mean. You should also keep in mind that your detector will not work on objects which are smaller than the discussed size, this is second reason why to keep it small.
  4. Your statement about groupRectangles is generally not true. With reasonable eps the function groups only overlapped rectangles, and the algorithm is still able to return multiple objects. It is usually quite simple to find proper eps for your particular case.

So, the real secret is in getting and cropping your positives. For non-rigid objects you need thousands of them, and find a compromise between uniform positioning and adding some variance. For negatives use as much images as you have, but you may need a couple of thousands.

edit flag offensive delete link more



Thanks for you answer!!

I find really interesting there is a lot of information about detecting faces and people but when it comes to rigid objects, like the brand of a computer or a stapler, there is very few. (I mean online)

About the groupRectangles, i've given up on trying to use it, because i can't get it to detect more than one object at a time.

But thank you very much, anyway :)

Bia6969 gravatar imageBia6969 ( 2012-08-06 04:52:29 -0600 )edit

Question Tools



Asked: 2012-08-02 08:50:43 -0600

Seen: 2,255 times

Last updated: Aug 04 '12