# Understanding OpenCV Stereobm implementation

Hello all,

I'm trying to understand the StereoBM implementation in CPU and GPU.
I started with the CPU on https://github.com/opencv/opencv/blob....

I have some basic knowledge on how a Stereo Block Matching works.

I was able to get into the flow but had some hiccups when hit the line 1204 which initialized an integer bufSize0 which then added with further values in the upcoming lines.

    int bufSize0 = (int)((ndisp + 2)*sizeof(int));
bufSize0 += (int)((height+wsz+2)*ndisp*sizeof(int));
bufSize0 += (int)((height + wsz + 2)*sizeof(int));
bufSize0 += (int)((height+wsz+2)*ndisp*(wsz+2)*sizeof(uchar) + 256);


It would be great if someone can help me out in understanding in what basis these buffer size calculations are done and what it is used for.

edit retag close merge delete

Sort by » oldest newest most voted

I've struggled through this StereoBM code too, and while I did not come to complete enlightenment, I came to an understanding. The data structure dynamically allocated and here indexed by bufSize0, is a cache of intermediate results that allows optimizing the algorithm for a much faster calculation.

The basic StereoBM algorithm is relatively easy to comprehend, but the algorithm scales as a product of dimensions, roughly O(cols * rows * disparity_range * windowsize * windowsize). You could write a simple, straightforward algorithm to do this, and it would be correct, but it would be slow.

But, if you can use a memory buffer (appropriately sized and dimensioned based on above parameters) to cache already calculated results, then you can save a lot of time by not recalculating block match scores exhaustively.

The first optimization is to realize that many of the addressed block matches at a given column, row, disparity offset, window offset (dx, dy), are exactly the same as a previously calculated block match score result . So, cache those intermediate results and reuse them without recalculating them.

The second optimization is to realize that if you've simply moved one row or column beyond a previously calculated block, since the block match score is a sum of absolute differences, you can start with the prior similar block match result, subtract the score result of the the row or column you are leaving behind, and add the score result from the row or column you are moving toward.

The details of the indexed data structure are complicated, and its addressing is complicated, but it is performing the above sorts of optimizations to store and reuse intermediate results, to vastly reduce the amount of time to exhaustively recalculate every block match score pixel by pixel.

more

1

Hey @opalmirror, Thanks for your answer, I Gone through the code completely now how an idea of how its implemented here, but am stuck at the last disparity calculation

            sad[-1] = sad;
d = p + n - 2*sad[mind] + std::abs(p - n);
dptr[y*dstep] = (short)(((ndisp - mind - 1 + mindisp)*256 + (d != 0 ? (p-n)*256/d : 0) + 15) >> 4);


As far as I can understand is that the mind is the actual disparity value for the pixel, then what is this calculation is about and why it has to be a 16bit fixed point value

1

As I understand it:

• mind is the minimum disparity. It is also the index into sad[] with the least difference value.
• the 1st two assignments ensure that sad[-1] and sad[ndisp] have actual values in them not smaller than sad[mind].
• The p, n, d trick seems to determine whether mind is at the left or right edge of a block of minimum disparity and needs to be smoothed.
• dptr[] is always an array of int16_t (definition of stereoBM disparity array). Units of disparity are scaled by a factor of 16.
• dptr[y*dstep] assigns the disparity value for this cell. it is max possible disparity less mind, smoothed if d shows mind at an edge, rounded up, scaled by 16.
• costptr optimizes next pass

Official site

GitHub

Wiki

Documentation