I read both Lowe's papers ('99&'04) and I would say I understood most of them. I saw all SIFT related classes on youtube, but none explicitly says why we would use both octaves and layers?
I perfectly understood that you get more layers in the same octave by ~calculating the ~Laplacian for different sigmas and then you resample to half the resolution to get the next octave, and again ~calculating the ~Laplacian for the same sigmas as in the first octave. And then you do this as many times as you feel like doing it.
Initially, I thought that you use the layers (multiple sigmas) to find features of different sizes on one image, and then you resample, so that you calculate descriptors on every octave (resampling level) for every feature, so that you get descriptors at different scales that might be better matches for descriptors in the other image at a similar scale. Apparently, I was wrong, only one descriptor is calculated for every feature, as it is calculated out of gradient orientations, so it is ~invariant to scale anyway.
But this leaves me wondering, why do we need to resample, why can't or shouldn't just use a high number of layers and just one octave (no resampling). Is this just because it is cheaper to resample? If yes, why don't we just resample?
Note: ~ sign means sort of. I use it when I know it is not the exact explanation, but the exact one would be longer and it wouldn't add any value to the question.