Ask Your Question

About opencv_traincascade....

asked 2016-04-04 07:20:55 -0500

aripod gravatar image

updated 2016-04-11 04:57:54 -0500


I am currently trying to train my own cascade based on Naotoshi Seo's tutorial and Codin Robiin's tutorial. I still have a couple of questions that I haven't find answers.

I will have some objects to detect in a controlled scenario (it will always be in the same room and the objects will not vary). Therefore, my idea is to grab my camera and save frames for an certain amount of time of the room with NONE of the objects present to gather the negative images. Then I would get the objects I want to detect on a turning table (one at the time, of course...), set the camera on a tripod and for different heights choose a ROI surrounding the object plus choosing when I want to start and stop saving the image, I can make the object rotate. Thus, I would have several views of the same objects from different angles plus I can get the X, Y position plus the size of the bounding box and easily save the file with the path, number of objects in the scene plus these four parameters to create the .vec file.

My questions are:

  1. I should save the images as grey scale, right?
  2. Which resolution should I save the negative images? (My camera is 1280x1024) Original or resized to.....?
  3. Should I save the entire image or just the ROI for the positive images?

I'd like to test this because as a first approach I took a picture of an object with my phone, cropped it and removed the background image description (50x50 grey scale image) and with opencv_createsamples plus the negatives that I took as described before (saved as grey scale 100x100).

Then to got my positive samples for training I run:

opencv_createsamples -img mouse1Resized.png -bg bg.txt -info pos/info.lst -jpgoutput pos/ -maxxangle 0.5 -maxyangle -0.5 -maxzangle 0.5 -num 1690 - bgcolor - -bgthresh 0

where 1690 is the number of negative images that I captured. Then I create the vec file with:

opencv_createsamples -info pos/info.lst -num 1690 -w 20 -h 20 -vec positives.vecInfo file name: pos/info.lst

And start training with:

opencv_traincascade -data data -vec positives.vec -bg bg.txt -numPos 1400 -numNeg 700 -numStages 15 -w 20 -h 20

When this finished, I tied the detector and I got a LOT of false positives, even when the object were not in the scene.

So here are some more questions.

  1. Should the negatives be 100x100?
  2. Should the positive be 50x50?
  3. When I create the .vec file, how large can -w and -h be?

I would like to test best approaches to see which gives the best results....or based on your experience, which one should I follow?

Thanks for the help.


This is the code I use for detections:

void detect(Mat frame, std::vector<Rect> &objects)
    int i, div=2;
    Mat frame_gray;
    resize(frame, frame_gray, Size(frame.cols/div,frame.rows/div));
    cvtColor(frame_gray, frame_gray ...
edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2016-04-05 07:53:10 -0500

updated 2016-04-06 04:58:49 -0500

Let us start by formulating some answers

  • I am currently trying to train my own cascade based on Naotoshi Seo's tutorial and Codin Robiin's tutorial. Tough many online tutorials reference to it, in my honest opinion, they are seriously lacking info, they use old interfaces, use plain wrong settings and they are far from up to date. Avoid them and use for more recent tutorials like the complete chapter in OpenCV 3 Blueprints on cascade classifier training, or the complete content of this forum, which is far better than those tutorials.
  • Yes grayscale is the way to go, tough if you supply colour images, the opencv_createsamples will render them grayscale and apply histogram equalization itself. So you can provide annotations on coloured images.
  • Negative images can be as large as you prefer. They basically get sampled by the positive sample size until it covers the whole image. They should however be equal or larger than -w x -h to avoid negatives being ignored.
  • You should save the entire positive image and create a postives.txt file containing annotations, which are basically bounding boxes of the objects.
  • removed the background Please don't ... in your application background will also be present! You need to make your model robust to background noice.
  • You used the tool to warp images ... read my chapter in OpenCV 3 Blueprints or search this forum to find out why you should absolutely NOT do that, but rather collect meaningful positive samples!
  • I got a LOT of false positives this means that you have yet a model that needs more training OR more training data AND surely more negatives to descriminate the background. Use bootstrapping, also described in the chapter to use your first model to improve a second one!
  • Negatives can be of any size, positives too
  • -w and -h can be as large as you want BUT know that how larger they are, the more features they will contain AND thus the more memory you will need to store them during training! On limited memory system try to get the sizes down as much as possible without loosing valuable edge info.

Again, go read the book! It is a collection of 2 years of my PhD experience on the interface. You might be surprised what a bunch of info it contains!


Therefore, as I can take 50 images of my object from different angles and use them as positives. Would this be better?

I am of that opinion yes. However do accept that a cascade classifier is not completely viewpoint invariant, so if the viewpoint changes to drastically you might need 2 models. This can for example be seen in the face model, where we have a seperate model for frontal faces and for profile faces. However a face that is rotated 15 degrees from a profile face will still be detected by the profile face model and vice versa for the frontal face.

For the negatives, I can do a 'random ... (more)

edit flag offensive delete link more



I read it and I agree ;-) Very helpful indeed!

Mathieu Barnachon gravatar imageMathieu Barnachon ( 2016-04-05 09:46:06 -0500 )edit
StevenPuttemans gravatar imageStevenPuttemans ( 2016-04-06 04:17:53 -0500 )edit

I've started reading the book, but in the meantime I'd like to ask a few things to leave it training while I read. I read that you answered that it's better to have 50 good positive images than getting one and generate 50 with opencv_createsamples. Therefore, as I can take 50 images of my object from different angles and use them as positives. Would this be better? The other thing is the negatives. As I want to detect the objects in a controlled environment (e.g my office) I can do a 'random' walk gathering images without the object, right?

I also read that I should aim for NEG count : acceptanceRatio around 0.0004 to consider a good cascade and if it is ~5.3557e-05 over trained?

aripod gravatar imagearipod ( 2016-04-06 04:30:28 -0500 )edit

Forgot to ask. By default the boosting algorithm is GAB. Which one gives better results? I believe the chose GAB as it is the one that consumes less ram? The computer I'm using has 32gb of ram plus another 32gb of swap so, should I go with RAB?

aripod gravatar imagearipod ( 2016-04-06 04:49:30 -0500 )edit

Look at the updates in my answer ;) will add the last question also!

StevenPuttemans gravatar imageStevenPuttemans ( 2016-04-06 04:57:41 -0500 )edit

I thought the viewpoint variation was due to how the cascade was trained. So if I want to detect a phone, I should capture pictures of it from the front, train a a mode, rotate it a few angles and train a new model again....but, let's say I end up with 3 models per object and I have 5 o 6 objects. This leads to ~15 models which I have to run for each frame to see if any of my objects is there. I guess it will take a lot of time and it won't be possible to have a real-time detection, right? Of course I believe that the idea of using a turning table to capture as many pictures as possible of the object is forbidden....

And how do youcalculate the precision to recall values after each stage and see if it increases or decreases ...(more)

aripod gravatar imagearipod ( 2016-04-06 07:28:26 -0500 )edit

I've just gathered 60 images from the phone different angles (a few degrees...) and hights with 300 negative samples of randomly walking in the took 1 o 2 minutes, got a Required leaf false alarm rate achieved with NEG count : acceptanceRatio 250 : 0.000752328 but still got around 25 false positives......I trained it with RAB....

aripod gravatar imagearipod ( 2016-04-06 08:16:40 -0500 )edit

@aripod detecting object completely rotation and viewpoint invariant is indeed still something that can NOT been done in real time. The techniques and hardware needed for that are simply not yet standard, not even in research communities. It differs if you have a set of known parameters like symmetry, camera distance, lighting conditions, ... but then it is still very challenging! As to the turning table, it won't work since the object will have the background of the rotation table, and thus the edge features will not be representative for decent use, leading to false positive detections or no detections at all. Precision and recall calculations, take a look at wikipedia, it explains it pretty well.

StevenPuttemans gravatar imageStevenPuttemans ( 2016-04-07 03:32:36 -0500 )edit

For the model trained, there is a difference between negative images and negative samples. Can you give me your training parameters? Then it will become clearer what you did. Your model actually tells you that it cannot remove negative windows as negative, so you need more negatives. Are your actual objects being detected? If not, you need more positives.

StevenPuttemans gravatar imageStevenPuttemans ( 2016-04-07 03:36:49 -0500 )edit

Maybe I don't use a rotating table but "I" rotate around the object to get the background? Here is my dataset. The object is rarely detected. Once again, thanks for your big help!

aripod gravatar imagearipod ( 2016-04-07 04:27:34 -0500 )edit

Question Tools



Asked: 2016-04-04 07:20:55 -0500

Seen: 1,683 times

Last updated: Apr 11 '16