Ask Your Question

Revision history [back]

Actually -w -h are the width and heigth parameters of your resulting model, not of your training data. Your training data can be of different sizes, but all annotations (object regions) will be rescaled towards the predefined model scale using the create_samples utility.

Why is this important, using smaller sizes, can reduce the amount of training time drastically, while using larger model sizes, will require you to have a system with much more memory available, since a lot of the process is stored into memory for each stage, like calculating all possible features and such.

Knowing for example that a 24x24 pixel image can generate over 160.000 distinct features and that it grows exponentially, you can see how a 100x100 pixel size can break down your training.

I always suggest my students and collegue researchers to use a size beneath 50x50 pixels. If you have for example training data with average dimensions of 250 x 750 pixels, then I tell them to keep the aspect ratio, but to reduce the training data to like 25x75 pixels or even half of that, 12 x 37 or so...

The size of your actual model will define how good features can be detected. Larger training images will yield more features, but maybe a lot of features that cannot be retrieved in smaller scales... It is a process of trial and error, before finding the correct configuration for your model.

At detection time, a image scale pyramid is made and your model size is passed over the different layers, so it doesn't depend to much then on the actual model scale, except for details...