I'm learning how to train the cascade classifier for object detection, I have sample project I trained from a car dataset and it works ok.

Now I would like to create my own classifier to detect a specific object of my choice. I plan on taking some videos of the object in different backgrounds (positives) and some random videos of backgrounds without the objects (negatives).

My questions is, once I extract the images from the videos I take:

  • Do I have to convert the colour images to gray scale?
  • Do I have to resize the images to make them smaller?
  • When I train the classifier and creating the vector file, what size should I use for the width and height? Should I use the actual image size or something else?

answered 2016-03-06 10:29:32 -0500

berak gravatar image
  1. NO. (this will be done automatically)
  2. NO. (this will be done automatically) but you want to crop/extract the rectangular region around your object using the annotation tool, and pass an info.txt file with those boxes to the createsamples tool.
  3. this is a bit tricky. the size there is the minimum size, that can be detected later, so on the one hand, this should be as small as possible, on the other hand, a larger one might give you a better detection. e.g. the face-cascades were trained using a 24x24 rect. also note, that memory usage during the training goes exponentially with the size there, so keep it as low as possible.
Thanks berak!

I'll work on getting my dataset and let you know how it goes.

Otto gravatar imageOtto ( 2016-03-06 13:18:12 -0500 )edit

