Ask Your Question
2

About opencv_traincascade....

asked 2016-04-04 07:20:55 -0600

aripod gravatar image

updated 2016-04-11 04:57:54 -0600

Hello,

I am currently trying to train my own cascade based on Naotoshi Seo's tutorial and Codin Robiin's tutorial. I still have a couple of questions that I haven't find answers.

I will have some objects to detect in a controlled scenario (it will always be in the same room and the objects will not vary). Therefore, my idea is to grab my camera and save frames for an certain amount of time of the room with NONE of the objects present to gather the negative images. Then I would get the objects I want to detect on a turning table (one at the time, of course...), set the camera on a tripod and for different heights choose a ROI surrounding the object plus choosing when I want to start and stop saving the image, I can make the object rotate. Thus, I would have several views of the same objects from different angles plus I can get the X, Y position plus the size of the bounding box and easily save the file with the path, number of objects in the scene plus these four parameters to create the .vec file.

My questions are:

  1. I should save the images as grey scale, right?
  2. Which resolution should I save the negative images? (My camera is 1280x1024) Original or resized to.....?
  3. Should I save the entire image or just the ROI for the positive images?

I'd like to test this because as a first approach I took a picture of an object with my phone, cropped it and removed the background image description (50x50 grey scale image) and with opencv_createsamples plus the negatives that I took as described before (saved as grey scale 100x100).

Then to got my positive samples for training I run:

opencv_createsamples -img mouse1Resized.png -bg bg.txt -info pos/info.lst -jpgoutput pos/ -maxxangle 0.5 -maxyangle -0.5 -maxzangle 0.5 -num 1690 - bgcolor - -bgthresh 0

where 1690 is the number of negative images that I captured. Then I create the vec file with:

opencv_createsamples -info pos/info.lst -num 1690 -w 20 -h 20 -vec positives.vecInfo file name: pos/info.lst

And start training with:

opencv_traincascade -data data -vec positives.vec -bg bg.txt -numPos 1400 -numNeg 700 -numStages 15 -w 20 -h 20

When this finished, I tied the detector and I got a LOT of false positives, even when the object were not in the scene.

So here are some more questions.

  1. Should the negatives be 100x100?
  2. Should the positive be 50x50?
  3. When I create the .vec file, how large can -w and -h be?

I would like to test best approaches to see which gives the best results....or based on your experience, which one should I follow?

Thanks for the help.

EDIT 1:

This is the code I use for detections:

void detect(Mat frame, std::vector<Rect> &objects)
{
    int i, div=2;
    Mat frame_gray;
    resize(frame, frame_gray, Size(frame.cols/div,frame.rows/div));
    cvtColor(frame_gray, frame_gray ...
(more)
edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
2

answered 2016-04-05 07:53:10 -0600

updated 2016-04-06 04:58:49 -0600

Let us start by formulating some answers

  • I am currently trying to train my own cascade based on Naotoshi Seo's tutorial and Codin Robiin's tutorial. Tough many online tutorials reference to it, in my honest opinion, they are seriously lacking info, they use old interfaces, use plain wrong settings and they are far from up to date. Avoid them and use for more recent tutorials like the complete chapter in OpenCV 3 Blueprints on cascade classifier training, or the complete content of this forum, which is far better than those tutorials.
  • Yes grayscale is the way to go, tough if you supply colour images, the opencv_createsamples will render them grayscale and apply histogram equalization itself. So you can provide annotations on coloured images.
  • Negative images can be as large as you prefer. They basically get sampled by the positive sample size until it covers the whole image. They should however be equal or larger than -w x -h to avoid negatives being ignored.
  • You should save the entire positive image and create a postives.txt file containing annotations, which are basically bounding boxes of the objects.
  • removed the background Please don't ... in your application background will also be present! You need to make your model robust to background noice.
  • You used the tool to warp images ... read my chapter in OpenCV 3 Blueprints or search this forum to find out why you should absolutely NOT do that, but rather collect meaningful positive samples!
  • I got a LOT of false positives this means that you have yet a model that needs more training OR more training data AND surely more negatives to descriminate the background. Use bootstrapping, also described in the chapter to use your first model to improve a second one!
  • Negatives can be of any size, positives too
  • -w and -h can be as large as you want BUT know that how larger they are, the more features they will contain AND thus the more memory you will need to store them during training! On limited memory system try to get the sizes down as much as possible without loosing valuable edge info.

Again, go read the book! It is a collection of 2 years of my PhD experience on the interface. You might be surprised what a bunch of info it contains!

UPDATED ANSWER

Therefore, as I can take 50 images of my object from different angles and use them as positives. Would this be better?

I am of that opinion yes. However do accept that a cascade classifier is not completely viewpoint invariant, so if the viewpoint changes to drastically you might need 2 models. This can for example be seen in the face model, where we have a seperate model for frontal faces and for profile faces. However a face that is rotated 15 degrees from a profile face will still be detected by the profile face model and vice versa for the frontal face.

For the negatives, I can do a 'random ... (more)

edit flag offensive delete link more

Comments

1

I read it and I agree ;-) Very helpful indeed!

Mathieu Barnachon gravatar imageMathieu Barnachon ( 2016-04-05 09:46:06 -0600 )edit
StevenPuttemans gravatar imageStevenPuttemans ( 2016-04-06 04:17:53 -0600 )edit
1

I've started reading the book, but in the meantime I'd like to ask a few things to leave it training while I read. I read that you answered that it's better to have 50 good positive images than getting one and generate 50 with opencv_createsamples. Therefore, as I can take 50 images of my object from different angles and use them as positives. Would this be better? The other thing is the negatives. As I want to detect the objects in a controlled environment (e.g my office) I can do a 'random' walk gathering images without the object, right?

I also read that I should aim for NEG count : acceptanceRatio around 0.0004 to consider a good cascade and if it is ~5.3557e-05 over trained?

aripod gravatar imagearipod ( 2016-04-06 04:30:28 -0600 )edit

Forgot to ask. By default the boosting algorithm is GAB. Which one gives better results? I believe the chose GAB as it is the one that consumes less ram? The computer I'm using has 32gb of ram plus another 32gb of swap so, should I go with RAB?

aripod gravatar imagearipod ( 2016-04-06 04:49:30 -0600 )edit

Look at the updates in my answer ;) will add the last question also!

StevenPuttemans gravatar imageStevenPuttemans ( 2016-04-06 04:57:41 -0600 )edit

I thought the viewpoint variation was due to how the cascade was trained. So if I want to detect a phone, I should capture pictures of it from the front, train a a mode, rotate it a few angles and train a new model again....but, let's say I end up with 3 models per object and I have 5 o 6 objects. This leads to ~15 models which I have to run for each frame to see if any of my objects is there. I guess it will take a lot of time and it won't be possible to have a real-time detection, right? Of course I believe that the idea of using a turning table to capture as many pictures as possible of the object is forbidden....

And how do youcalculate the precision to recall values after each stage and see if it increases or decreases ...(more)

aripod gravatar imagearipod ( 2016-04-06 07:28:26 -0600 )edit

I've just gathered 60 images from the phone different angles (a few degrees...) and hights with 300 negative samples of randomly walking in the office....training took 1 o 2 minutes, got a Required leaf false alarm rate achieved with NEG count : acceptanceRatio 250 : 0.000752328 but still got around 25 false positives......I trained it with RAB....

aripod gravatar imagearipod ( 2016-04-06 08:16:40 -0600 )edit

@aripod detecting object completely rotation and viewpoint invariant is indeed still something that can NOT been done in real time. The techniques and hardware needed for that are simply not yet standard, not even in research communities. It differs if you have a set of known parameters like symmetry, camera distance, lighting conditions, ... but then it is still very challenging! As to the turning table, it won't work since the object will have the background of the rotation table, and thus the edge features will not be representative for decent use, leading to false positive detections or no detections at all. Precision and recall calculations, take a look at wikipedia, it explains it pretty well.

StevenPuttemans gravatar imageStevenPuttemans ( 2016-04-07 03:32:36 -0600 )edit

For the model trained, there is a difference between negative images and negative samples. Can you give me your training parameters? Then it will become clearer what you did. Your model actually tells you that it cannot remove negative windows as negative, so you need more negatives. Are your actual objects being detected? If not, you need more positives.

StevenPuttemans gravatar imageStevenPuttemans ( 2016-04-07 03:36:49 -0600 )edit

Maybe I don't use a rotating table but "I" rotate around the object to get the background? Here is my dataset. The object is rarely detected. Once again, thanks for your big help!

aripod gravatar imagearipod ( 2016-04-07 04:27:34 -0600 )edit

Not sure why that would not work, it quite the distinctive object you are using ....

StevenPuttemans gravatar imageStevenPuttemans ( 2016-04-07 06:12:58 -0600 )edit

I don't believe there's something wrong with the line to run the training" opencv_traincascade -data data -vec positives.vec -bg bg.txt -numPos 150 -numNeg 300 -numStages 10 -w 24 -h 24 ....

aripod gravatar imagearipod ( 2016-04-07 06:55:48 -0600 )edit

Actually there is. You have 300 negative images, but are passing 300 to -numNeg parameter which is actually the number of negative windows which is quite different to the amount of images. Also only 300 negatives to 200 positives will never yield any good classifiers. Try at least like 250 POS and 1000 NEG windows. Basically you have not enough training data for the moment to yield a better classifier.

StevenPuttemans gravatar imageStevenPuttemans ( 2016-04-07 07:00:57 -0600 )edit

Sorry for the dumb question..A window would be a section of an images, right?

aripod gravatar imagearipod ( 2016-04-07 07:07:32 -0600 )edit

Well the model is trained by resizing your positive annotations to the -w and -h parameters first. Then a sliding window of -w x -h size goes over negative images and cuts out negative samples. BTW why are you making those parameters only 24 pixels large?

StevenPuttemans gravatar imageStevenPuttemans ( 2016-04-07 07:10:16 -0600 )edit

Ok, got it. So I don't need to get more neg but increase the windows for the training algorithm, but for the 250 pos I would need around 280 positives, right? I chose 24 based on the frontal_face cascade.....plus if I make it bigger, the detection would be slower? I don't mind if the training is slow, but detection should be fast as there would be more than 5 objects to detect, and considering a few cascades per image to have it rotation variant, it might take a long time to run de detection (it needs to be ~ real time)

aripod gravatar imagearipod ( 2016-04-07 07:14:12 -0600 )edit

I took 250 new images of the object (with different backgrounds) and 305 images of the office without the object and left it training using opencv_traincascade -data data -vec positives.vec -bg bg.txt -numPos 220 -numNeg 1000 -numStages 10 -w 80 -h 80 -bt RAB. After 10 and a half hours I got:

POS count : consumed   220 : 221
NEG count : acceptanceRatio    1000 : 0.000370403
Required leaf false alarm rate achieved. Branch training terminated.

The result of the detection is: without the object there are false detections and with the object there are also false detections plus no good ones....

aripod gravatar imagearipod ( 2016-04-11 03:02:28 -0600 )edit

And how are you setting your detection parameters? You must be doing something wrong here ... because your result seems impossible. Can you give me your detection code? Also could you change the boosting the standard for a starter?

StevenPuttemans gravatar imageStevenPuttemans ( 2016-04-11 03:37:49 -0600 )edit

I found the issue! In the detection I'm downscaling so it is faster and I commented the rescaling of the bounding box. The code is in the question, as EDIT 1. Now the detection works but I still have a lot of false positives. Look in this short video

aripod gravatar imagearipod ( 2016-04-11 04:52:04 -0600 )edit
1

Ow your code detect_cascade.detectMultiScale(frame_gray, objects, 1.2, 2, 0|CV_HAAR_SCALE_IMAGE, Size(80, 80)); explains it all. You are placing the number of adjacent neighbors on 2 which means each weak detection is visualised also. Stuff you can do to improve this

  • Increase the minNeighbours parameter to at least 5 or 10. This will ensure that only higher scoring detections remain.
  • However still some FP detections can trigger a high certainty. Why not use the temporal approach and smooth over frames. Only if a detection remains for more then 4-5 frames, you visualise it, else you remove it. This will also remove alot of FP detections.
StevenPuttemans gravatar imageStevenPuttemans ( 2016-04-11 06:04:37 -0600 )edit

I changed minNeighbours to 5 and then 10 but I still have a lot of false positives. For the temporal approach, these false detections also remain in time so they would still be detected and never discarded..... I'm running the training again by setting the maxFalseRateAlarm to 0.25 (half default), I switch back to GAB. And in parallel I'm running the same but setting to LBP rather than HAAR to see which one works better......what do you think?

Should I get more positive samples to avoid the false negatives....? Or increase the negative windows?

aripod gravatar imagearipod ( 2016-04-11 07:05:24 -0600 )edit

I would go for more negatives, since it can clearly find your positives. Try increasing it to lets say 2000 samples per stage. Also LBP trains alot faster than HAAR, stick to that for the time being. Putting maxFalseAlarmRate to 0.25 seems wrong to me, since the doesnt do what we expect from cascade classifiers. You will train harsher singles stages and thus a lot more features will need to be evaluated. You want to discard as many negatives as possible with the least effort as possible.

StevenPuttemans gravatar imageStevenPuttemans ( 2016-04-11 07:14:00 -0600 )edit

Should I get more negative images of with the ones I have it's enough? Just increasing the numNeg will be enough? And you think I should do all this tests to LBP until getting the right training values and then retrain it with HAAR? And what about the rotation of the object? It keeps detecting it when I rotate close to 45 deg to each side...

aripod gravatar imagearipod ( 2016-04-11 07:19:18 -0600 )edit

You will never now if you have enough negative samples to grab windows from as long as it doesn't throw an error. So start by simply increasing numNeg value. Yes, do LBP untill you get decent results, the drawback in accuracy is so minimal that I am not even using HAAR features anymore. Depending on the model and viewpoint it is indeed rotation invariant on itself. I guess the border is around +-30 degrees. But you won't cover the full 360 degrees with one model.

StevenPuttemans gravatar imageStevenPuttemans ( 2016-04-11 07:34:09 -0600 )edit

And I guess the bigger the numNeg the better, more robust, right? I'll keep testing with that number with the negative samples I have for now and get back to you with the results. Is there a way to specify how it gets the negative samples? It might be nice to specify a number of windows for each image to have this process more "controlled" and gather the full picture rather than randomly For the rotation, I don't need 360, but around 45 degrees to each side.....which for now seems correct.....in the case I want to increase this, the solution is ALWAYS to get more samples, right?

aripod gravatar imagearipod ( 2016-04-11 07:51:11 -0600 )edit

Question Tools

2 followers

Stats

Asked: 2016-04-04 07:20:55 -0600

Seen: 2,121 times

Last updated: Apr 11 '16