Ask Your Question

Revision history [back]

OpenCV poor performance of Haar classifier trained by me

I would like to use the Haar classifier to detect the presence of vehicles in a scene (trying with only cars so far). Since I have not found many trained XML files online, I decided to generate my own.

I found some image sets of vehicles that have been used for similar purposes (training computer vision algorithms) and used these to create my own XML files. It has been almost a week and some of them have finished, so I tried using them but the results were terrible. The classifiers I found online worked decently, at least it appears they are trying to detect vehicles and work fast enough for real-time application (maybe 5-10 FPS or so).

Whereas mine can take several minutes to analyze a frame using detectMultiScale() using the same parameters, and if I pass different parameters (e.g. increase min size, decrease max size, increase scaling factor) it will work faster (maybe 1 FPS) but detects absolutely nothing of note, never detects any vehicle and randomly detects some spots of asphalt as a vehicle.

Where did I go wrong in generating my files? I have limited time to complete this task and these classifiers can take a whole week to train so I have very few attempts remaining. For reference, my methodology is (following this tutorial):

-Take all positive and negative images; if no negative images supplied, take negative images from another data set, at least as many negatives as positives

-Generate as many samples as the number of positives

-Use same parameters as suggested, except image size (set to the size of images in a given data set), and nstages (set to 10 because 20 takes far too long)

-For the npos parameter, I use 1/10th the number of samples, using the full number of samples resulted in "assertion failed" after a few hours, apparently the number of samples cannot be the same as the npos according to this so I gave myself a safety margin.

TL;DR Haar classifier I trained myself performs much worse than one found online (in terms of time and accuracy), need advice on how to improve it and not waste another week training it.