Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Well, by building multiple scales in your pyramid, you are tackling the problem of the possible variation of scale of the object you are looking for in the scene you are observing. The specific details vary form method to method, but actually you want to know what a certain keypoint looks like at different scale. So, if you "meet it" in the observed scene, you will be able to recognize it even if it is bigger or smaller than in the model object.

Strictly speaking, an octave is a ratio of 1/2 between the original image (object model) size and the reduced image. Actually, it is not forbidden to have a scale that is of a different value. The finer the variation between two scales of the pyramid, the more discriminant you will be able to be when detecting keypoints. Same thing for the number of octaves you use; it is a little bit like the range of scales you are looking at.

So, as a rule of thumb, the more octaves and the less "space" between them you have, the better your algorithm will be able to discriminate the keypoints. On the other hand, for every octaves that you have, you must do the same computations again so if you use 7 octaves, it will be longer to compute than just using 2-3. Also, if you know beforehand the scale at which the searched object is present in the scene image, you may constrain your pyramid so it won't look at implausible scales.

Again, this is a simplification and does not necessarily fit every method, but this is the main idea.