# Help me with the opencv_traincascade training

-------------------------------------------------------------------------------------------------------------------------------------------
EDIT : This is the data for my second try in the training after the poor classifier generated from the first stage

E:\_102flowers-500X500>opencv_traincascade -data classifier -vec samples.vec -bg negatives.txt -numStages 20 -minHitRate 0.990 -maxFalseAlarmRate 0.4 -weightTrimRate 0.95 -numPos 4000 -numNeg 8188 -fe
atureType LBP -w 60 -h 60 -mode ALL -precalcValBufSize 1024 -precalcIdxBufSize 1024
PARAMETERS:
vecFileName: samples.vec
bgFileName: negatives.txt
numPos: 4000
numNeg: 8188
numStages: 20
precalcValBufSize[Mb] : 1024
precalcIdxBufSize[Mb] : 1024
acceptanceRatioBreakValue : -1
stageType: BOOST
featureType: LBP
sampleWidth: 60
sampleHeight: 60
boostType: GAB
minHitRate: 0.99
maxFalseAlarmRate: 0.4
weightTrimRate: 0.95
maxDepth: 1
maxWeakCount: 100

===== TRAINING 0-stage =====
<BEGIN
POS count : consumed   4000 : 4000
NEG count : acceptanceRatio    8188 : 1
Precalculation time: 41.897
+----+---------+---------+
|  N |    HR   |    FA   |
+----+---------+---------+
|   1|        1|        1|
+----+---------+---------+
|   2|        1|        1|
+----+---------+---------+
|   3|        1|        1|
+----+---------+---------+
|   4|  0.99275| 0.689179|
+----+---------+---------+
|   5|  0.99625| 0.694187|
+----+---------+---------+
|   6|  0.99125|  0.50745|
+----+---------+---------+
|   7|  0.99025| 0.372008|
+----+---------+---------+
END>
Training until now has taken 0 days 1 hours 56 minutes 8 seconds.

===== TRAINING 1-stage =====
<BEGIN
POS count : consumed   4000 : 4043
NEG count : acceptanceRatio    8188 : 0.397727
Precalculation time: 37.581
+----+---------+---------+
|  N |    HR   |    FA   |
+----+---------+---------+
|   1|        1|        1|
+----+---------+---------+
|   2|        1|        1|
+----+---------+---------+
|   3|        1|        1|
+----+---------+---------+
|   4|   0.9955| 0.790914|
+----+---------+---------+
|   5|  0.99175|  0.71849|
+----+---------+---------+
|   6|    0.993| 0.640205|
+----+---------+---------+
|   7|  0.99075| 0.540669|
+----+---------+---------+
|   8|  0.99025| 0.496336|
+----+---------+---------+
|   9|  0.99025| 0.481803|
+----+---------+---------+
|  10|   0.9905| 0.392037|
+----+---------+---------+
END>
Training until now has taken 0 days 4 hours 33 minutes 30 seconds.

===== TRAINING 2-stage =====
<BEGIN
POS count : consumed   4000 : 4081
NEG count : acceptanceRatio    8188 : 0.164428
Precalculation time: 37.299
+----+---------+---------+
|  N |    HR   |    FA   |
+----+---------+---------+
|   1|        1|        1|
+----+---------+---------+
|   2|        1|        1|
+----+---------+---------+
|   3|        1|        1|
+----+---------+---------+
|   4|        1|        1|
+----+---------+---------+
|   5|   0.9925| 0.846605|
+----+---------+---------+
|   6|  0.99025| 0.682096|
+----+---------+---------+
|   7|    0.991| 0.709697|
+----+---------+---------+
|   8|    0.991| 0.665852|
+----+---------+---------+
|   9|  0.99125| 0.598559|
+----+---------+---------+
|  10|   0.9905| 0.605887|
+----+---------+---------+
|  11|  0.99075| 0.528334|
+----+---------+---------+
|  12|  0.99025| 0.484367|
+----+---------+---------+
|  13|  0.99025| 0.441622|
+----+---------+---------+
|  14|  0.99025| 0.386175|
+----+---------+---------+
END>
Training until now has taken 0 days 8 hours 9 minutes 56 seconds.

===== TRAINING 3-stage =====
<BEGIN
POS count : consumed   4000 : 4126
NEG count : acceptanceRatio    8188 : 0.0991043
Precalculation time: 42.651
+----+---------+---------+
|  N |    HR   |    FA   |
+----+---------+---------+
|   1|        1|        1|
+----+---------+---------+
|   2|        1|        1|
+----+---------+---------+
|   3|        1|        1|
+----+---------+---------+
|   4|    0.992| 0.651075|
+----+---------+---------+
|   5|  0.99675| 0.805691|
+----+---------+---------+
|   6|  0.99175| 0.533341|
+----+---------+---------+
|   7|  0.99075| 0.528212|
+----+---------+---------+
|   8|   0.9905| 0.468735|
+----+---------+---------+
|   9|  0.99025| 0.461651|
+----+---------+---------+
|  10|   0.9905| 0.403762|
+----+---------+---------+
|  11|  0.99025| 0.382145|
+----+---------+---------+
END>
Training until now has taken 0 days 11 hours 16 minutes 56 seconds.

===== TRAINING 4-stage =====
<BEGIN
POS count : consumed   4000 : 4191
NEG count : acceptanceRatio    8188 : 0.032643
Precalculation time: 37.487
+----+---------+---------+
|  N |    HR   |    FA   |
+----+---------+---------+
|   1|        1|        1|
+----+---------+---------+
|   2|        1|        1|
+----+---------+---------+
|   3|        1|        1|
+----+---------+---------+
|   4|  0.99175| 0.846727|
+----+---------+---------+
|   5|  0.99275|  0.85723|
+----+---------+---------+
|   6|  0.99075| 0.787982|
+----+---------+---------+
|   7|    0.992|     0.75|
+----+---------+---------+
|   8|  0.99025| 0.680752|
+----+---------+---------+
|   9|   0.9905|  0.62958|
+----+---------+---------+
|  10|  0.99025| 0.632877|
+----+---------+---------+
|  11|  0.99025| 0.553127|
+----+---------+---------+
|  12 ...
edit retag close merge delete

Please pay some attention to what you are doing. Looking at your profile your last ten questions have been about the same interface. There is a button at the bottom of each topic that allows you to edit your question. Please make use of that in the future.

( 2015-04-25 03:34:51 -0500 )edit

@StevenPuttemans : Sry for that. Will edit it in future.(I mean this one)

( 2015-04-25 03:37:06 -0500 )edit

Sort by » oldest newest most voted

... with a total of 10,000 positives and 8188 negatives of a flower ...

• This will simply not work. You need more negatives than positives to be able to differentiate positives from different backgrounds. Also I get the impression you are mixing up the concept of negative images (which is gathered manually) and negative windows (which are indicated by -numNeg and which are automatically retrieved from the negative set you supplied at model size).

... to create 10,000 positives from 250 actual positives and 8188 negatives ...

• Again I fairly advice against this practice. In an actual application this will NOT work. Better use those 250 actual flower images to train a model, then to generate 10.000 artificial samples from that. You will notice if you add the -show parameter to the create samples tool, that many of the samples created are unnatural and will thus never occur in your application. This will just clutter your model

... continue the training further(given my current acceptance ratio)? ...

YES, like being said in other topic, I am convinced that you should train up till the first value that goes below 10e-5. After that you will overfit your data, before your model will be to generic and yield to much false positive detections.

... it takes a lot of time ...

That is due to the fact you have that many samples. And looking at how complex (many features per stage) your model is training it is also due to the fact that your data is fairly complex to seperate. Also alot of time is relative without a proper indication.

more

Okay, first of Thank You for all the answers up until now you have given for my and everybody's questions.(I have noticed). Second, right...you said the -numNeg is no of Windows chosen by the cascade. This implies I have been in wrong impression about the parameters(due to those guides). I will continue this training but for the next iteration I am thinking about using -numPos as half of my total positive samples and numNeg as double of numPos instead of putting the actual number of images I have. Will this be okay?

( 2015-04-25 04:05:04 -0500 )edit

Also, I want to tell you that I have 102 flower categories each having many images totaling to 8189. What I have done is used 250 out of one category and put rest of the images(of different flowers) under negatives(except the 250). Did I do this right?

( 2015-04-25 04:07:12 -0500 )edit
1

1. Yes your new suggestion about -numPos and -numNeg seems better.
2. This won't work for flower classification into species ... the variance will not be large enough to get a decent acceptanceRatio I am guessing. This is also the reason why your model is so complex.

Actually I am getting the impression you are using the wrong techniques for the wrong purpose.

• Cascade classifiers with boosting are meant for object detection/localisation based on the overall object class shape.
• Therefore they only give you a location of the flower
• Then you should use a multi class feature descriptor and a machine learning approach to get the correct flower species.

So start the training of the flower model, I would suggest using about 1000 images of each category.

( 2015-04-25 06:30:11 -0500 )edit

@StevenPuttemans :Okay,you were right! The classifier after 12 stages is poor. I tested it, it detected 2-3 features in each flower for positives(including one leaf-like feature) but when I tested it with negatives(bushes, ferns and other flowers), it detected one or two feature in each of them. The only reason I can attribute to this, is the presence of leafy background in my positives(every single ne of them) and untuned parameters that I chose. Well...accepted, fine! I am about to start my next iteration at training. For that, I have first "blackened" the leafy background and now this is where I am struck. I have 250 actual blackened positives. How do I generate a VEC file out of them all? opencv_createsamples utility seems to distort and mix with negatives no matter what.

( 2015-04-25 22:23:56 -0500 )edit

@StevenPuttemans : Is this "blackening" that I did sensical?
P.S : I don't have 1000 positives so gonna go ahead with 250 only. :(
So basically I am planning to use 250 positives and 8188 negatives!!

( 2015-04-25 22:26:11 -0500 )edit

@pulp_fiction it is because you train with positives that contain to much background information. You need to fit your training windows just around each flower or it won't work ... then no leaves should be trained as feature. Blackening it out is stupid, since it creates again edges that are not there in real situations and thus your model will be looking for those strong edgeness. You just said you got 102 flower categories totalling up to 8000 flowers, so you have more than 250 samples ... read my previous post again... You can use the createsamples utility with a text file that says where regions of interest are. Then no warping is applied.

( 2015-04-26 04:41:37 -0500 )edit
1

Am I the only one to notice that -numPos was set to 1521 out of 10,000 available samples?

( 2015-04-26 15:35:02 -0500 )edit

No you are not. He misinterpretet the worst case scenario formula given by the original author of the algorithm implementation

( 2015-04-26 15:38:48 -0500 )edit
1

@Gino Strato : I guess, this is part of the "learning curve" as they say. If not for this site(and SO) I would have abandoned my project and changed the topic. I am doing the training again with -numPos 4000 and -numNeg 8188 with numPos:numNeg = 1:2 approx and it is taking double the time. This time, from the starting stage there is 7 weak classifier internal stages. It is like,
Stage 0 : 7
Stage 1 : 10
Stage 2 : 14
Stage 3 : 11
Stage 4 : 17
I think it is because of the fact that, I have set -maxFalseAlarmRate 0.4

( 2015-04-26 22:04:55 -0500 )edit

@StevenPuttemans : When I said blackened, I did blacken the background then I chose -bgthresh 10 and -bgcolor 10 filtering out blackish region with opencv_createsamples. I checked it with -show and the placement of flower was without any black mask.
Also, I still want to ask you(rather confirm) that , opencv_createsamples marks my object's coordinates first then place it in the negative, it still has my object marked in the big negative background image, right?

( 2015-04-26 22:10:20 -0500 )edit

See my EDIT for new data. Its fairly complex with many features.

( 2015-04-26 22:53:30 -0500 )edit

@pulp_fiction, take a look here, it is an old tutorial describing partly the old and partly the new interface. You will see that opencv_createsamples has two possible ways of generating your positives.vec. I discourage you to use the current approach, which is many-from-one in the tutorial and rather use the many-without-distortions approach which will work WAY better in your case!

( 2015-04-27 03:12:28 -0500 )edit
1

Unless you: 1) gather much many samples, 2) decide in advance a criteria to crop the area of the flower, 3) possibly restrict your detection goal only to flower from a certain point of view, 4) understand that you cannot use adaboost to classifie flowers, but just to detect where a generic flower is located, you will never obtain a decent result.

( 2015-04-28 15:07:07 -0500 )edit

@Gino Strato, nice wrapup of all the comments :D

( 2015-04-29 02:51:57 -0500 )edit

@Gino Strato : Okay, you are 100 percent correct there. I finished training the classifier. Now, it detects my "Passion Flower" but also some other flowers and leaves too. I guess I can't go for species classification through ADA boost. Can you suggest another method for species classification on which several resources are available to learn and practice and which doesn't require a Phd necessarily?

( 2015-04-29 06:37:32 -0500 )edit

Is neuroph a good choice?

( 2015-04-29 06:51:27 -0500 )edit

I don't know neuroph, but I think that training a NN would be a good choice. (A NN with multiple outputs, each one corresponding to a class to be recognized)

( 2015-04-29 14:13:49 -0500 )edit

Official site

GitHub

Wiki

Documentation