# Why these two gaussian blur sequences are so different?

I'm trying to optimize this code, and in particular I'm trying to optimize how the sequence of gaussian blurs are computed.

I've rewritten the code in this way (notice that most of the parameters and convention have been defined by the original code author):

```
cv::Mat octaveLayer = //input image
int numberOfScales = 3; //user parameter
int levels = //user parameter
int scaleCycles = numberOfScales+2;
float sigmaStep = pow(2.0f, 1.0f / (float) numberOfScales);
std::vector<float> sigmas;
for(int i=2; i<numberOfScales+2; i++){
sigmas.push_back(sigmas[i-1]*sigmaStep);
}
vector<Mat> blurs (scaleCycles*levels+1, Mat());
for(int i=0; i<levels; i++){
blurs[i*scaleCycles+1] = octaveLayer.clone();
for (int j = 1; j < scaleCycles; j++){
float sigma = par.sigmas[j]* sqrt(sigmaStep * sigmaStep - 1.0f);
blurs[j+1+i*scaleCycles] = gaussianBlur(blurs[j+i*scaleCycles], sigma);
if(j == par.numberOfScales){
octaveLayer = halfImage(blurs[j+1+i*scaleCycles]);
}
}
}
```

Where:

```
Mat gaussianBlur(const Mat input, const float sigma)
{
Mat ret(input.rows, input.cols, input.type());
int size = (int)(2.0 * 3.0 * sigma + 1.0); if (size % 2 == 0) size++;
GaussianBlur(input, ret, Size(size, size), sigma, sigma, BORDER_REPLICATE);
return ret;
}
Mat halfImage(const Mat &input)
{
auto start = startTimerHesaff();
Mat n(input.rows/2, input.cols/2, input.type());
float *out = n.ptr<float>(0);
for (int r = 0, ri = 0; r < n.rows; r++, ri += 2)
for (int c = 0, ci = 0; c < n.cols; c++, ci += 2)
*out++ = input.at<float>(ri,ci);
return n;
}
```

Images ar in 0-255 range values, and this is the code used to read them:

```
Mat tmp = imread(argv[1]);
Mat image(tmp.rows, tmp.cols, CV_32FC1, Scalar(0));
float *out = image.ptr<float>(0);
unsigned char *in = tmp.ptr<unsigned char>(0);
for (size_t i=tmp.rows*tmp.cols; i > 0; i--)
{
*out = (float(in[0]) + in[1] + in[2])/3.0f;
out++;
in+=3;
}
```

I think these are all the information that you need to understand the code above, please let me know otherwise.

**The code above, which is correct, generates 1760 keypoints.**

I actually don't know many details of the code above, for example how different `sigma`

are computed or `sigmaSte`

value etc, I'm using the original code for that.

Now, what I want to do is compute each blur independently, so it can be parallelized. According to Wikipeida, this can be done with:

Applying multiple, successive gaussian blurs to an image has the same effect as applying a single, larger gaussian blur, whose radius is the square root of the sum of the squares of the blur radii that were actually applied. For example, applying successive gaussian blurs with radii of 6 and 8 gives the same results as applying a single gaussian blur of radius 10, since {\displaystyle {\sqrt {6^{2}+8^{2}}}=10} {\sqrt {6^{2}+8^{2}}}=10. Because of this relationship, processing time cannot be saved by simulating a gaussian blur with successive, smaller blurs — the time required will be at ...