In createsamples we use parameters -w and -h. These parameters have to be the size of the object or we can use smaller size?

edit retag close merge delete

Sort by » oldest newest most voted

Actually -w -h are the width and heigth parameters of your resulting model, not of your training data. Your training data can be of different sizes, but all annotations (object regions) will be rescaled towards the predefined model scale using the create_samples utility.

Why is this important, using smaller sizes, can reduce the amount of training time drastically, while using larger model sizes, will require you to have a system with much more memory available, since a lot of the process is stored into memory for each stage, like calculating all possible features and such.

Knowing for example that a 24x24 pixel image can generate over 160.000 distinct features and that it grows exponentially, you can see how a 100x100 pixel size can break down your training.

I always suggest my students and collegue researchers to use a size beneath 50x50 pixels. If you have for example training data with average dimensions of 250 x 750 pixels, then I tell them to keep the aspect ratio, but to reduce the training data to like 25x75 pixels or even half of that, 12 x 37 or so...

The size of your actual model will define how good features can be detected. Larger training images will yield more features, but maybe a lot of features that cannot be retrieved in smaller scales... It is a process of trial and error, before finding the correct configuration for your model.

At detection time, a image scale pyramid is made and your model size is passed over the different layers, so it doesn't depend to much then on the actual model scale, except for details...

more

@StevenPuttemans That is good advice. However, there are situations where it is best to use bigger detection windows. For example, think about detecting vehicles in aerial imagery. Contrarily to detecting a face on a webcam image, where you can be either close or far from the camera thus varying the object of interest's size, in vehicle detection from aerial imagery the object dimensions do not vary enough for it to be worth detecting at multiscale. If you train your classifier with a big enough training window to include all vehicles in the original image size, you'll sacrifice training time, granted, but the fact that you wont need to detect at multiscale will improve significantly your application's performance.

( 2013-11-20 04:45:57 -0500 )edit

@Median, go ahead and try to train the actual larger images... it will simply not work unless you have an abundant RAM size at your disposal. The algorithm is not designed for it. And most of the remarks go to the training phase, not the detector phase, which is something completely different.

( 2013-11-20 08:34:04 -0500 )edit

I trained a cascade using HOG features with w60 and h60. It needs around 2GB ram and performs very well on aerial vehicle detection. It runs almost at 30 fps on 640x480 images since i'm only detecting at single scale. For good computers, with 16 GB of ram and more, I guess it is not such a big deal to use bigger detection window sizes (as long as traincascade.exe is 64-bit).

( 2013-11-20 08:51:43 -0500 )edit

StevenPuttemans when I train my classifier and use the xml file, I notice that it can not detect any thing. Do you know what I am doing wrong?

( 2013-11-20 17:45:25 -0500 )edit

i trained a haar cascade face detector using real adaboost with -w 40 and -h 40. i had -numPos 6800 and -numNeg 3000. it took three days on a linux machine with 4x2.3 GHz processors and 7GB ram. during training 4.4 GB were occupied though I had set -precalcValBufSize 1024 and -precalcIdxBufSize 1024. also, during training, some stages would automatically take more than 6800 positives, like 6812 or so, from the vec file which had 7000 positive samples. ultimately the cascade file generated wasn't so good even when i used the CV_HAAR_SCALE_IMAGE flag during detection. i realized i needed to read the papers by viola-jones and lienhart again to better understand cascade training. it seems for any object the height and width defined between 15 and 24 are better.

( 2013-11-21 02:46:41 -0500 )edit

@Median, agreed if you have 16GB of RAM at your disposal then there is a lot more possible. However, I am always starting from the point of a laptop settings with 4GB RAM. This is because most people don't even have more. And 60x60 is still reasonable. However, up it to 100x100 and see the exponential difference! @ioanna for us to help, you need to specifiy all your parameters, do this by editing your question.

( 2013-11-21 03:06:00 -0500 )edit

@ioanna read section 5.3 and look at figure 11 in this paper: http://www.multimedia-computing.de/mediawiki/images/5/52/MRL-TR-May02-revised-Dec02.pdf larger input pattern size for the cascade training are not advisable. to detect a complex pattern like a car for instance, you need to decide on the viewing angle. top and side views of a car will likely have rectangular bounding box. front and rear views could fit in a square region. haar wavelets are always square or rectangular. you have to construct a feature set with those wavelets to describe the complex pattern you are interested in. finding the smallest such feature set is computationally advantageous.

( 2013-11-21 04:06:53 -0500 )edit

Official site

GitHub

Wiki

Documentation