About opencv_traincascade....

answered 2016-04-05 07:53:10 -0600

20029 ●16 ●82 ●207 http://stevenputtemans...

updated 2016-04-06 04:58:49 -0600

Let us start by formulating some answers

I am currently trying to train my own cascade based on Naotoshi Seo's tutorial and Codin Robiin's tutorial. Tough many online tutorials reference to it, in my honest opinion, they are seriously lacking info, they use old interfaces, use plain wrong settings and they are far from up to date. Avoid them and use for more recent tutorials like the complete chapter in OpenCV 3 Blueprints on cascade classifier training, or the complete content of this forum, which is far better than those tutorials.
Yes grayscale is the way to go, tough if you supply colour images, the opencv_createsamples will render them grayscale and apply histogram equalization itself. So you can provide annotations on coloured images.
Negative images can be as large as you prefer. They basically get sampled by the positive sample size until it covers the whole image. They should however be equal or larger than -w x -h to avoid negatives being ignored.
You should save the entire positive image and create a postives.txt file containing annotations, which are basically bounding boxes of the objects.
removed the background Please don't ... in your application background will also be present! You need to make your model robust to background noice.
You used the tool to warp images ... read my chapter in OpenCV 3 Blueprints or search this forum to find out why you should absolutely NOT do that, but rather collect meaningful positive samples!
I got a LOT of false positives this means that you have yet a model that needs more training OR more training data AND surely more negatives to descriminate the background. Use bootstrapping, also described in the chapter to use your first model to improve a second one!
Negatives can be of any size, positives too
-w and -h can be as large as you want BUT know that how larger they are, the more features they will contain AND thus the more memory you will need to store them during training! On limited memory system try to get the sizes down as much as possible without loosing valuable edge info.

Again, go read the book! It is a collection of 2 years of my PhD experience on the interface. You might be surprised what a bunch of info it contains!

UPDATED ANSWER

Therefore, as I can take 50 images of my object from different angles and use them as positives. Would this be better?

I am of that opinion yes. However do accept that a cascade classifier is not completely viewpoint invariant, so if the viewpoint changes to drastically you might need 2 models. This can for example be seen in the face model, where we have a seperate model for frontal faces and for profile faces. However a face that is rotated 15 degrees from a profile face will still be detected by the profile face model and vice versa for the frontal face.

For the negatives, I can do a 'random ... (more)

edit flag offensive delete link

Comments

I read it and I agree ;-) Very helpful indeed!

Mathieu Barnachon ( 2016-04-05 09:46:06 -0600 )edit

@Mathieu Barnachon thank you!

StevenPuttemans ( 2016-04-06 04:17:53 -0600 )edit

I've started reading the book, but in the meantime I'd like to ask a few things to leave it training while I read. I read that you answered that it's better to have 50 good positive images than getting one and generate 50 with opencv_createsamples. Therefore, as I can take 50 images of my object from different angles and use them as positives. Would this be better? The other thing is the negatives. As I want to detect the objects in a controlled environment (e.g my office) I can do a 'random' walk gathering images without the object, right?

I also read that I should aim for NEG count : acceptanceRatio around 0.0004 to consider a good cascade and if it is ~5.3557e-05 over trained?

aripod ( 2016-04-06 04:30:28 -0600 )edit

Forgot to ask. By default the boosting algorithm is GAB. Which one gives better results? I believe the chose GAB as it is the one that consumes less ram? The computer I'm using has 32gb of ram plus another 32gb of swap so, should I go with RAB?

aripod ( 2016-04-06 04:49:30 -0600 )edit

Look at the updates in my answer ;) will add the last question also!

StevenPuttemans ( 2016-04-06 04:57:41 -0600 )edit

I thought the viewpoint variation was due to how the cascade was trained. So if I want to detect a phone, I should capture pictures of it from the front, train a a mode, rotate it a few angles and train a new model again....but, let's say I end up with 3 models per object and I have 5 o 6 objects. This leads to ~15 models which I have to run for each frame to see if any of my objects is there. I guess it will take a lot of time and it won't be possible to have a real-time detection, right? Of course I believe that the idea of using a turning table to capture as many pictures as possible of the object is forbidden....

And how do youcalculate the precision to recall values after each stage and see if it increases or decreases ...(more)

aripod ( 2016-04-06 07:28:26 -0600 )edit

I've just gathered 60 images from the phone different angles (a few degrees...) and hights with 300 negative samples of randomly walking in the office....training took 1 o 2 minutes, got a Required leaf false alarm rate achieved with NEG count : acceptanceRatio 250 : 0.000752328 but still got around 25 false positives......I trained it with RAB....

aripod ( 2016-04-06 08:16:40 -0600 )edit

@aripod detecting object completely rotation and viewpoint invariant is indeed still something that can NOT been done in real time. The techniques and hardware needed for that are simply not yet standard, not even in research communities. It differs if you have a set of known parameters like symmetry, camera distance, lighting conditions, ... but then it is still very challenging! As to the turning table, it won't work since the object will have the background of the rotation table, and thus the edge features will not be representative for decent use, leading to false positive detections or no detections at all. Precision and recall calculations, take a look at wikipedia, it explains it pretty well.

StevenPuttemans ( 2016-04-07 03:32:36 -0600 )edit

For the model trained, there is a difference between negative images and negative samples. Can you give me your training parameters? Then it will become clearer what you did. Your model actually tells you that it cannot remove negative windows as negative, so you need more negatives. Are your actual objects being detected? If not, you need more positives.

StevenPuttemans ( 2016-04-07 03:36:49 -0600 )edit

Maybe I don't use a rotating table but "I" rotate around the object to get the background? Here is my dataset. The object is rarely detected. Once again, thanks for your big help!

aripod ( 2016-04-07 04:27:34 -0600 )edit

Not sure why that would not work, it quite the distinctive object you are using ....

StevenPuttemans ( 2016-04-07 06:12:58 -0600 )edit

I don't believe there's something wrong with the line to run the training" opencv_traincascade -data data -vec positives.vec -bg bg.txt -numPos 150 -numNeg 300 -numStages 10 -w 24 -h 24 ....

aripod ( 2016-04-07 06:55:48 -0600 )edit

Actually there is. You have 300 negative images, but are passing 300 to -numNeg parameter which is actually the number of negative windows which is quite different to the amount of images. Also only 300 negatives to 200 positives will never yield any good classifiers. Try at least like 250 POS and 1000 NEG windows. Basically you have not enough training data for the moment to yield a better classifier.

StevenPuttemans ( 2016-04-07 07:00:57 -0600 )edit

Sorry for the dumb question..A window would be a section of an images, right?

aripod ( 2016-04-07 07:07:32 -0600 )edit

Well the model is trained by resizing your positive annotations to the -w and -h parameters first. Then a sliding window of -w x -h size goes over negative images and cuts out negative samples. BTW why are you making those parameters only 24 pixels large?

StevenPuttemans ( 2016-04-07 07:10:16 -0600 )edit

Ok, got it. So I don't need to get more neg but increase the windows for the training algorithm, but for the 250 pos I would need around 280 positives, right? I chose 24 based on the frontal_face cascade.....plus if I make it bigger, the detection would be slower? I don't mind if the training is slow, but detection should be fast as there would be more than 5 objects to detect, and considering a few cascades per image to have it rotation variant, it might take a long time to run de detection (it needs to be ~ real time)

aripod ( 2016-04-07 07:14:12 -0600 )edit

I took 250 new images of the object (with different backgrounds) and 305 images of the office without the object and left it training using opencv_traincascade -data data -vec positives.vec -bg bg.txt -numPos 220 -numNeg 1000 -numStages 10 -w 80 -h 80 -bt RAB. After 10 and a half hours I got:

POS count : consumed   220 : 221
NEG count : acceptanceRatio    1000 : 0.000370403
Required leaf false alarm rate achieved. Branch training terminated.

The result of the detection is: without the object there are false detections and with the object there are also false detections plus no good ones....

aripod ( 2016-04-11 03:02:28 -0600 )edit

And how are you setting your detection parameters? You must be doing something wrong here ... because your result seems impossible. Can you give me your detection code? Also could you change the boosting the standard for a starter?

StevenPuttemans ( 2016-04-11 03:37:49 -0600 )edit

I found the issue! In the detection I'm downscaling so it is faster and I commented the rescaling of the bounding box. The code is in the question, as EDIT 1. Now the detection works but I still have a lot of false positives. Look in this short video

aripod ( 2016-04-11 04:52:04 -0600 )edit

Ow your code detect_cascade.detectMultiScale(frame_gray, objects, 1.2, 2, 0|CV_HAAR_SCALE_IMAGE, Size(80, 80)); explains it all. You are placing the number of adjacent neighbors on 2 which means each weak detection is visualised also. Stuff you can do to improve this

Increase the minNeighbours parameter to at least 5 or 10. This will ensure that only higher scoring detections remain.
However still some FP detections can trigger a high certainty. Why not use the temporal approach and smooth over frames. Only if a detection remains for more then 4-5 frames, you visualise it, else you remove it. This will also remove alot of FP detections.

StevenPuttemans ( 2016-04-11 06:04:37 -0600 )edit

I changed minNeighbours to 5 and then 10 but I still have a lot of false positives. For the temporal approach, these false detections also remain in time so they would still be detected and never discarded..... I'm running the training again by setting the maxFalseRateAlarm to 0.25 (half default), I switch back to GAB. And in parallel I'm running the same but setting to LBP rather than HAAR to see which one works better......what do you think?

Should I get more positive samples to avoid the false negatives....? Or increase the negative windows?

aripod ( 2016-04-11 07:05:24 -0600 )edit

I would go for more negatives, since it can clearly find your positives. Try increasing it to lets say 2000 samples per stage. Also LBP trains alot faster than HAAR, stick to that for the time being. Putting maxFalseAlarmRate to 0.25 seems wrong to me, since the doesnt do what we expect from cascade classifiers. You will train harsher singles stages and thus a lot more features will need to be evaluated. You want to discard as many negatives as possible with the least effort as possible.

StevenPuttemans ( 2016-04-11 07:14:00 -0600 )edit

Should I get more negative images of with the ones I have it's enough? Just increasing the numNeg will be enough? And you think I should do all this tests to LBP until getting the right training values and then retrain it with HAAR? And what about the rotation of the object? It keeps detecting it when I rotate close to 45 deg to each side...

aripod ( 2016-04-11 07:19:18 -0600 )edit

You will never now if you have enough negative samples to grab windows from as long as it doesn't throw an error. So start by simply increasing numNeg value. Yes, do LBP untill you get decent results, the drawback in accuracy is so minimal that I am not even using HAAR features anymore. Depending on the model and viewpoint it is indeed rotation invariant on itself. I guess the border is around +-30 degrees. But you won't cover the full 360 degrees with one model.

StevenPuttemans ( 2016-04-11 07:34:09 -0600 )edit

And I guess the bigger the numNeg the better, more robust, right? I'll keep testing with that number with the negative samples I have for now and get back to you with the results. Is there a way to specify how it gets the negative samples? It might be nice to specify a number of windows for each image to have this process more "controlled" and gather the full picture rather than randomly For the rotation, I don't need 360, but around 45 degrees to each side.....which for now seems correct.....in the case I want to increase this, the solution is ALWAYS to get more samples, right?

aripod ( 2016-04-11 07:51:11 -0600 )edit

see more comments

About opencv_traincascade....

1 answer

Comments

Links

Question Tools

Stats

Related questions

About opencv_traincascade.... edit

1 answer

Comments

Links

Question Tools

Stats

Related questions

About opencv_traincascade....