Ask Your Question

Opencv_traincascade training too fast?

asked 2018-04-18 13:39:16 -0500

uckizz gravatar image


Im creating my own carwheel-detection cascade as a fun project. My attempt so far is based on different tutorials, and I've described the whole process below:

Positive data: 40 images of a carwheel, cropped from photos taken of cars, downsized to 50x50 png (approx. 7kb size each). Negative data: 600 random outdoor photos not containing cars or wheels. Resized to 500x500 jpg (approx. 100kb each)

Used Naotoshi Seo's perlscript to generate 1500 positive samples (same settings except -w and -h set to 50x50).

Used his script to merge all the .vec files generated.

Used the same trainingparameters with opencv_traincascade, except with LBP, and -w and -h parameters set to 50x50.

Well, training is super fast (A couple of mins for 20 stages), and when I tested, it detected a lot of false positives. I suspect somethings wrong with the data, or that I can tweak some parameters/settings.

Does anyone have any ideas or tips on what parameters/settings/datatweaks I can use for better performance?



edit retag flag offensive close merge delete


"Positive data: 40 images of a carwheel," -- come back with 10X or even 100x of th that

berak gravatar imageberak ( 2018-04-18 13:46:07 -0500 )edit

I thought the perlscript generated more samples? I followed this tutorial:, he uses 40 images and then uses the script to generate 1500 samples.

uckizz gravatar imageuckizz ( 2018-04-18 13:48:12 -0500 )edit

forget the perl script (or any silly attempt at generating synthetic data from single images)

berak gravatar imageberak ( 2018-04-18 13:50:33 -0500 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2018-04-19 03:37:10 -0500

updated 2018-04-19 03:39:32 -0500

Ah I was waiting when this would come back. If you want a more detailed background, go for chapter 5 in OpenCV 3 Blueprints, but here are some pointers.

  • Like stated by @berak, forget the perl script generation of artificial samples. It simply does not hold and creates bad classifiers. Go for pure real samples. Better 50 real samples than a 1000 artificial ones.
  • Then you don't need the mergevec either, which tends to cause issues for alot of people
  • A fast training means that your seperation between positive and negative samples is easy. Probably it only needs a couple of weak classifiers to have a succesful seperation. Increasing complexity and thus training time, can be done with adding more training data, setting your settings more strict, ... even increasing resolution can help.
  • False positives means that your detector still does not know exactly what a negative sample is, hence it needs more negative data. Try negative bootstrapping: use your initial detector, collect false positives, feed those as hard negatives.

The Q&A litterally has 1000 questions on this, you would be amazed how much details you can find here.

edit flag offensive delete link more

Question Tools

1 follower


Asked: 2018-04-18 13:39:16 -0500

Seen: 184 times

Last updated: Apr 19 '18