Revision history [back]

understanding the limitations of traincascade

Hi,

I have been trying to train a cascade classifier for cigarette boxes but seem to get variable success depending on the type of box. If the box is mostly white I don't get a classifier that works. I have attempted multiple trainings both using a script to generate more samples, and also using manually generated samples of the objects. Neither of the methods has worked.

I was hoping someone could help with the following questions, and/or, point me in the direction of reading material that can help to answer why the doesn't work. I also have the following questions on the same, if someone can help:

Does the method of cascade classification have any limitation on the kind of object that can be detected? Does having more features/surfaces increase the chance of detection? eg. comparing a cigarette box Vs a telephone. Are objects with a lot of white harder to detect?
Impact of Pose of objects: I have used training images which are taken from a variety of angles from the object, some directly overhead, some at varying angles from the object. This results in extra faces showing in some images, and not in some. Does the classifier need to have just images of just one pose (eg. only the front face should be clipped and not the rest). I am asking this because a similar car based training that was performed uses only a single pose of the car. I am not sure what this means for objects like boxes/books, which have primarily a front and back face.
Can a cascade classifier be trained for a single type of object, eg. a specific brand of cigarettes, or is it better to train it for cigarette packs in general, and then run object detection to determine brand. I have come across threads where people have talked about training for general object type, and using sub-classifiers to train for only a particular type of that object (eg. flowers, and looking for a particular kind of flower). Are there any limitations for the types of objects that can be trained.
When taking images of rotated objects, part of the image background will always get saved when cropping the image. What is the impact of having a background in the cropped image? I assume that when using createsamples for generation of fake samples, it makes sense to have closely cropped images with no background so that the generated samples are more realistic. I assume this is not required as strictly when using actual samples, and not generated ones.
What is the impact of a large amount of intensity variation in training images? Is it good/bad/ does it depend on the actual images that are supposed to be detected? Natoshi Seo's blog suggests that having fewer illumination and pose variations is better for training faces. However, he does use the script which itself adds illumination and pose variations.
Is it correct to generate positive images using videos? Or is it better to get individual samples using differing backgrounds each time?

It would be useful if someone can point me in the right direction to getting answers to the above questions. Thanks & Regards.