Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

How does stereo SGBM algorithm really work inside ?

Hello everyone,

I have been playing with stereo BM & stereo SGBM for a little while now, and even if I understood in details how the BM algorithm works, I am still struggling a lot to understand the latter. I have tried to read all relevant papers : Stereo Processing by Semi-Global Matching and Mutual Information by Heiko Hirschmüller, Learning OpenCV 3 by Adrian Kaehler and Gary Bradski, Depth Discontinuities by Pixel-to-Pixel Stereo by Stan Birchfield and Carlo Tomasi, and also looked at many other thesis on the subject, yet I haven't been able to understand how the SGBM algorithme (which is a combination of BM with a variation of SGM) really works inside.

What I mainly don't understand are the following :

1) What is the Birchfield-Tomasi metric used in the algorithm ? Every paper relates to a "Birchfield-Tomasi metric" but no one explains what it is and reading the Birchfield-Tomasi paper didn't help me to understand that.

2) How is a window used in that algorithm (what is the operation made with that window)?

3) And finally, what are the different directions that can be used (3, 5 or 8) ? aren't we supposed to compute a cost matching only on the epipolar lines ?

It is really frustrating for me, I have spent the last entire 2 days trying to figure this out and haven't been able to do so,and I really need to be able to explain that in my thesis.

Any help would be more than welcome !

How does stereo SGBM algorithm really work inside ?

Hello everyone,

I have been playing with stereo BM & stereo SGBM for a little while now, and even if I understood in details how the BM algorithm works, I am still struggling a lot to understand the latter. I have tried to read all relevant papers : Stereo Processing by Semi-Global Matching and Mutual Information by Heiko Hirschmüller, Learning OpenCV 3 by Adrian Kaehler and Gary Bradski, Depth Discontinuities by Pixel-to-Pixel Stereo by Stan Birchfield and Carlo Tomasi, and also looked at many other thesis on the subject, yet I haven't been able to understand how the SGBM algorithme (which is a combination of BM with a variation of SGM) really works inside.

What I mainly don't understand are the following :

1) What is the Birchfield-Tomasi metric used in the algorithm ? Every paper relates to a "Birchfield-Tomasi metric" but no one explains what it is and reading the Birchfield-Tomasi paper didn't help me to understand that.

2) How is a window used in that algorithm (what is the operation made with that window)?

3) Why is there a preFilterCap in this algorithm if the filter type can only be a Sobel (in contrast with BM algorithm where there is a choice between Sobel & an other prefilter where preFilterCap is used)?

4) And finally, what are the different directions that can be used (3, 5 or 8) ? aren't we supposed to compute a cost matching only on the epipolar lines ?

It is really frustrating for me, I have spent the last entire 2 days trying to figure this out and haven't been able to do so,and I really need to be able to explain that in my thesis.

Any help would be more than welcome !

How does stereo SGBM algorithm really work inside ?

Hello everyone,

I have been playing with stereo BM & stereo SGBM for a little while now, and even if I understood in details how the BM algorithm works, I am still struggling a lot to understand the latter. I have tried to read all relevant papers : Stereo Processing by Semi-Global Matching and Mutual Information by Heiko Hirschmüller, Learning OpenCV 3 by Adrian Kaehler and Gary Bradski, Depth Discontinuities by Pixel-to-Pixel Stereo by Stan Birchfield and Carlo Tomasi, and also looked at many other thesis on the subject, yet I haven't been able to understand how the SGBM algorithme (which is a combination of BM with a variation of SGM) really works inside.

What I mainly don't understand are the following :

1) What is the Birchfield-Tomasi metric used in the algorithm ? Every paper relates to a "Birchfield-Tomasi metric" but no one explains what it is and reading the Birchfield-Tomasi paper didn't help me to understand that.

2) How is a window used in that algorithm (what is the operation made with that window)?

3) Why is there a preFilterCap in this algorithm if the filter type can only be a Sobel (in contrast with BM algorithm where there is a choice between Sobel & an other prefilter where preFilterCap is used)?

4) And finally, what are the different directions that can be used (3, 5 or 8) ? aren't we supposed to compute a cost matching only on the epipolar lines ?

It is really frustrating for me, I have spent the last entire 2 days trying to figure this out and haven't been able to do so,and I really need to be able to explain that in my thesis.

Any help would be more than welcome !

How does stereo SGBM algorithm really work inside ?

Hello everyone,

I have been playing with stereo BM & stereo SGBM for a little while now, and even if I understood in details how the BM algorithm works, I am still struggling a lot to understand the latter. I have tried to read all relevant papers : Stereo Processing by Semi-Global Matching and Mutual Information by Heiko Hirschmüller, Learning OpenCV 3 by Adrian Kaehler and Gary Bradski, Depth Discontinuities by Pixel-to-Pixel Stereo by Stan Birchfield and Carlo Tomasi, and also looked at many other thesis on the subject, yet I haven't been able to understand how the SGBM algorithme (which is a combination of BM with a variation of SGM) really works inside.

What I mainly don't understand are the following :

1) What is the Birchfield-Tomasi metric used in the algorithm ? Every paper relates to a "Birchfield-Tomasi metric" but no one explains what it is and reading the Birchfield-Tomasi paper didn't help me to understand that.

2) How is a window used in that algorithm (what is the operation made with that window)?

3) Why is there a preFilterCap in this algorithm if the filter type can only be a Sobel (in contrast with BM algorithm where there is a choice between Sobel & an other prefilter where preFilterCap is used)?

4) And finally, what are the different directions that can be used (3, 5 or 8) ? aren't we supposed to compute a cost matching only on the epipolar lines ?

It is really frustrating for me, I have spent the last entire 2 days trying to figure this out and haven't been able to do so,and I really need to be able to explain that in my thesis.

Any help would be more than welcome !