# Speeding up the computation of SSD of 3x3 patches

Hi,

As part of a bigger application, I need to compute the following code:

    ax2 += (int)(25 + 0.5);
ay2 += (int)(25 + 0.5);

bx2 += (int)(25 + 0.5);
by2 += (int)(25 + 0.5);

cx2 += (int)(25 + 0.5);
cy2 += (int)(25 + 0.5);

for (int ix = -1; ix <= 1; ix++){
for (int iy = -1; iy <= 1; iy++){
suma += (grayImage.at<uchar>(ay2 + iy, ax2 + ix) - grayImage.at<uchar>(by2 + iy, bx2 + ix))* grayImage.at<uchar>(ay2 + iy, ax2 + ix) - grayImage.at<uchar>(by2 + iy, bx2 + ix));
}
}


It basically computes the sum of squared difference of two 3X3 patches.

It runs extremely slow. Is there any way of speeding it up?

EDIT:

I changed to the following version:

for (int ix = -1; ix <= 1; ix++){
for (int iy = -1; iy <= 1; iy++){
double difa = grayImage.at<uchar>(ay2 + iy, ax2 + ix) - grayImage.at<uchar>(by2 + iy, bx2 + ix);
suma += (difa)*(difa);
}
}


And it runs faster, but is there any way to improve it further?

EDIT: Thanks for the answer, I'm not using the following code:

//int iy = -1;
Mi_a = grayImage.ptr<uchar>(ay2 - 1);
Mi_b = grayImage.ptr<uchar>(by2 - 1);
Mi_c = grayImage.ptr<uchar>(cy2 - 1);

difa = Mi_a[ax2 - 1] - Mi_b[bx2 - 1];
suma += (difa)*(difa);
difc = Mi_c[cx2 - 1] - Mi_b[bx2 - 1];
sumc += (difc)*(difc);
difa = Mi_a[ax2 + 0] - Mi_b[bx2 + 0];
suma += (difa)*(difa);
difc = Mi_c[cx2 + 0] - Mi_b[bx2 + 0];
sumc += (difc)*(difc);
difa = Mi_a[ax2 + 1] - Mi_b[bx2 + 1];
suma += (difa)*(difa);
difc = Mi_c[cx2 + 1] - Mi_b[bx2 + 1];
sumc += (difc)*(difc);

//int iy=0;
Mi_a = grayImage.ptr<uchar>(ay2 + 0);
Mi_b = grayImage.ptr<uchar>(by2 + 0);
Mi_c = grayImage.ptr<uchar>(cy2 + 0);

difa = Mi_a[ax2 - 1] - Mi_b[bx2 - 1];
suma += (difa)*(difa);
difc = Mi_c[cx2 - 1] - Mi_b[bx2 - 1];
sumc += (difc)*(difc);
difa = Mi_a[ax2 + 0] - Mi_b[bx2 + 0];
suma += (difa)*(difa);
difc = Mi_c[cx2 + 0] - Mi_b[bx2 + 0];
sumc += (difc)*(difc);
difa = Mi_a[ax2 + 1] - Mi_b[bx2 + 1];
suma += (difa)*(difa);
difc = Mi_c[cx2 + 1] - Mi_b[bx2 + 1];
sumc += (difc)*(difc);

//int iy=1
Mi_a = grayImage.ptr<uchar>(ay2 + 1);
Mi_b = grayImage.ptr<uchar>(by2 + 1);
Mi_c = grayImage.ptr<uchar>(cy2 + 1);

difa = Mi_a[ax2 - 1] - Mi_b[bx2 - 1];
suma += (difa)*(difa);
difc = Mi_c[cx2 - 1] - Mi_b[bx2 - 1];
sumc += (difc)*(difc);
difa = Mi_a[ax2 + 0] - Mi_b[bx2 + 0];
suma += (difa)*(difa);
difc = Mi_c[cx2 + 0] - Mi_b[bx2 + 0];
sumc += (difc)*(difc);
difa = Mi_a[ax2 + 1] - Mi_b[bx2 + 1];
suma += (difa)*(difa);
difc = Mi_c[cx2 + 1] - Mi_b[bx2 + 1];
sumc += (difc)*(difc);


Is there any way to speed it up even further?

Thanks,

Gil.

P.S. it's part of an algorithm that I intend to contribute to OpenCV, once it will be published.

edit retag close merge delete

Sort by ยป oldest newest most voted

It depends on how much you want to sacrifice for time performance. The code will be faster but dirtier. If time is really really important you can do:

1. Avoid using 'for' loops. Just write 'suma' as sum of 9 values.

2. Don't use .at<>. Instead use pointers to access memory directly.

uchar* p = grayImage.ptr(ay2) + ax2;

uchar* q = grayImage.ptr(by2) + bx2;

// difference at top left pixel of 3x3 patch

difa = p[-step-1] - q[-step-1];

// and so on...

more

Thanks for the answer - what exactly is "step"?

( 2014-06-18 11:13:36 -0600 )edit

step is distance between beginning points of 2 following rows. It is bigger or equal to the width of the image (it is equal when the image is continuous in memory). You can get it by calling grayImage.step

( 2014-06-19 01:00:39 -0600 )edit

Official site

GitHub

Wiki

Documentation