# Weird results from cascade training

So I've been playing around with training of cascade classifiers, and I seem to be getting very similar results regardless of changes I make to the various inputs, and I wanted to ask for some advice before I continued trying to guess what some of the issues might be.

The object I'm trying to detect is a hand holding a cell-phone. My positive samples look like this:

and my negative samples are a wide range of images that do not include somebody holding a camera, generally of people.

I have about 500 positive and 2000 negative images.

I'm mostly following this guide which is closely based on Naotoshi Seo's notes and tools.

Using his perl script, I'm creating 5000 positive samples with the opencv_createsamples utility; my actual command looks like this.

perl bin/createsamples.pl positives.txt negatives.txt samples 5000 \
"opencv_createsamples-bgcolor 0 -bgthresh 0 -maxxangle 1.1 \
-maxyangle 1.1 -maxzangle 0.5 -maxidev 40 -w 20 -h 40"


my call to opencv_traincascade then looks like this:

opencv_traincascade -data classifier -vec samples.vec -bg negatives.txt -numStages 20 \
-minHitRate 0.999 -maxFalseAlarmRate 0.5 -numPos 4600 -numNeg 2212 -w 20 -h 40 \
-mode ALL -precalcValBufSize 2048 -precalcIdxBufSize 2048 -featureType LBP


I've been alternating between LBP and Haar, but have lately been preferring LBP because I've been changing inputs and iterating a lot to try and see what seems to change my results.

The conclusion so far being 'not much'. Here are two examples of the resulting classifier's running:

Which, well, isn't quite what I was looking for.

Any thoughts on what I should try differently? I can keep adding samples, if necessary, but that doesn't seem to have changed my results too much so far, and I'm wondering if there's something more significant that I'm doing wrong. I've tried samples of different sizes (20x40 and 40x80) I've tried increasing my number of stages, and I've tried both haar and LBP features. What should I try next?

Bonus questions:

1) How important is the subject matter of the negative images? If we assume that most of the positive matches will come from images in which we see a body from at least the waist up, should I try to really focus i selecting similar types of images for my negatives? More directly, does adding more negative images help if they aren't the type of image that a positive match would resemble (e.g. large face-only portraits, or landscapes).

2) Is it better to use the -info flag with create samples, if I have a large number of positive samples that contain the target object? I have a functionally unlimited set of positive and negative samples, so I could very conceivably have thousands of samples without needing to synthesize them from positive+negative images, but it seems like that is working pretty well for other people.

edit retag close merge delete

Sort by » oldest newest most voted

Basically what you are doing wrong is using the createsamples utility to introduce transformations. This will work in cleaner and lab environments but it creates unrealistic features for the object in real life situations. You should start by removing that part and gather positive samples the hard way, by retrieving thousands of original training images with as much variation as possible captured in the samples.

You will also need to introduce a lot more negatives. You want to detect an object instance in a random situation, where the variation in possible backgrounds is huge. You have to think about it in the following matter. You have to try to include every possible background as a negative. This means that many object models for out in the wild detection come with huge sets of negatives, think in the number of multiple 100'000s. However the advantage is that you can provide a wild range of negatives, that are larger then your model, and the algoritm will take random samples from that images. That means that 5000 images of large resolution could easily get you 150.000 windows to train as negatives.

Aside from that, parameter tweaking is always one of the long taking processes. Each application will have its own set of specific settings to get the best result. This is a long period of trial and error I am afraid, which could be partially automated.

Also, use LBP until you reach a somewhat decent model. The training is a tenfold faster and detection is faster also, mainly due to using integer operations in both steps. It will get your models with alot of samples trained in days rather then in weeks with HAAR features.

1. Actually the viewpoint in academic research is the more the merier. It means that the more negatives you introduce, even if the chance of it happening is like 0.001% will increase the performance of your system! You want to reduce all random background variation as much as possible and exclude that from your search space.
2. I suggest quitting the old C-style HAARtraining API and take a look at the newer, better developed traincascade interface using the C++ interface. It works way better! The info option then handles the annotation files, in which you instruct where bounding boxes are found. And yes this is also one of the most time consuming parts, annotating all positives in the positive dataset, but it reduces the automatic scaling to a standard size immensly!
more

1

great answer, thanks for your help. I'll put something together to help me generate annotation files and see what my results look like that way.

( 2013-09-24 11:58:00 -0500 )edit
1

Dear cmyr, How did your results fare? Can you please share your lessons learnt? ( I know this is about a year old). But, whatever you can recollect, it might be helpful for me. Thanks samjakar

( 2014-08-25 05:48:41 -0500 )edit

Official site

GitHub

Wiki

Documentation