# Haartraining process display big array data

Hi everybody,

I have my haartraining process running for 3 days at 15 nodes (30 max) that display the following traces:

+----+----+-+---------+---------+---------+---------+
|64082|  5%|-|-5600.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64083|  5%|-|-5600.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64084|  5%|-|-5599.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64085|  5%|-|-5600.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64086|  5%|-|-5599.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64087|  5%|-|-5599.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64088|  5%|-|-5600.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64089|  5%|-|-5600.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64090|  5%|-|-5599.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64091|  5%|-|-5600.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64092|  5%|-|-5601.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64093|  6%|-|-5600.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64094|  6%|-|-5599.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64095|  6%|-|-5600.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64096|  6%|-|-5599.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64097|  6%|-|-5598.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64098|  6%|-|-5597.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64099|  5%|-|-5596.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64100|  6%|-|-5595.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64101|  5%|-|-5595.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64102|  5%|-|-5596.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64103|  5%|-|-5596.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64104|  5%|-|-5596.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64105|  5%|-|-5597.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64106|  6%|-|-5596.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64107|  5%|-|-5595.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64108|  5%|-|-5596.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64109|  5%|-|-5595.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64110|  5%|-|-5594.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64111|  5%|-|-5595.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64112|  5%|-|-5596.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64113|  5%|-|-5596.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64114|  5%|-|-5597.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64115|  5%|-|-5596.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64116|  5%|-|-5596.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64117|  5%|-|-5597.975098| 1.000000| 1.000000| 0.810449|
+----+----+-+---------+---------+---------+---------+
|64118|  5%|-|-5597.975098| 1.000000| 1.000000| 0.810449|


I wonder if I can stop the process or it's normal?

Can you help me?

After, I will try the traincascade process that seems to be more newer and maybe better than haartraining (as I can read on this post : here

edit retag close merge delete

Sort by » oldest newest most voted

What the progress does is the follwing, based on %SMP

• %SMP stands as you say for percentage of samples
• During the first stage a single feature is used to classify all images, you will get 100% usage of the samples at that moment.
• Secondly, all the samples that were classified wrong by the feature, are treated in the next feature, for again classifying positive and negative samples.
• This continues, so the percentage shows the amount of elements the weak classifier in the previous step did not succeed to classify correctly.

Does this makes any sense?

more

I'm not sure that I understand. So, the %SMP indicate the percentage of samples that can't be classified... And we try to treate them on the next feature in order to treat all of them? I afraid because my process is running for 1 week ago and I get 5% since 4 days as we can see on the trace below...

But I wonder if it's good ?

( 2013-03-13 03:38:28 -0500 )edit

Those percentage are specific for each stage. However, if you think about it, 5% at the end stage (lets say stage 30) will not represent 5% of your original data. It will be 5%^(number of Stages). So basically what this is saying, is that you will never be able to create a classifier that will cover 100% of samples. However, reaching 99.99% of the samples will be possible, with enough training. Again, don't let yourself be distracted by the duration, haartraining could take way more than 4 days :)

( 2013-03-13 03:57:36 -0500 )edit

Haartraining uses haar-like wavelets to define characteristics about the images to be trained. Imagining that a 24 x 24 pixels sample has about 180.000 features of this kind, you can see that the progress adaBoost takes to choose the correct feature for the next step can be quite computational expensive.

Having that said, getting 3 days is still quite fast for haartraining, going from the point that you used quite a large training set database. I have seen trainings of 1000 positives and 5000 negatives taken over more than a week to complete. So be patient I would say.

If you decide to stop training at this level, kill the process, then repeat the training, giving the same destination folder to the algorithm, with 1 stage less than the stages already trained. it will just read the parameter files and create a new xml cascade model from those. This will give you the chance to actually try your detector.

About traincascade, its the newer training system contained in OpenCV for cascade models. Big difference is that you can also use LBP features and HOG features to train your model. Advantages of LBP is that it is way faster in training. Large datasets are done 10 times faster.

So suggestion I give you, first try to define a good amount of data for your detector by using the LBP trainer. Then when your sample set is good enough to reach a robust detector, spent some time on training with HAAR.

Any more questions, feel free to ask.

more

Thanks for your complete answer, it's very interesting. Effectively I use -npos 6122 -nneg 3019 with 32x32 pixels samples. What explain a long time process. How do you calculate the features number (180 000) that you see with 24x24? So I will let the process running. An other question : When we choose -featureType LBP, HOG (Histogramme Oriented Gradient) features are also use to train the model? (I have other questions that I keep in mind for the moment...)

( 2013-03-08 03:10:16 -0500 )edit

With that amount of samples I would suggest taking into account 1-2 weeks. Certainly since the higher the stage, the longer the training will take. Maybe a hint can be to increase the amount of negatives compared to positives in order to reduce the amount of false positives. You would prefer to model the background variation as good as possible, in order to remove those false positives. Concerning the option featureType, you can only select a single feature type at once I guess, haven't tried multiple inputs yet. Got a system here to try it out, so I will let you know if it is even possible. But as I see it, you have three possible solutions, HAAR or LBP or HOG. Feel free to post more questions, and be so kind to accept my answers if you feel it helped you.

( 2013-03-08 03:24:53 -0500 )edit

Thanks Steven. I have a question about the %SMP that get down (100% -> 5%) as the traitment progress... What does it mean? I understand that the %SMP is the pourcentage of samples use, isn't it? So the process don't use all sample as the process run... See you later.

( 2013-03-11 11:15:49 -0500 )edit

Official site

GitHub

Wiki

Documentation