Revision history [back]

Before worrying about how the bounding boxes are calculated, I'd recommend you first read this, this, this and this which would help you understand what is actually happening behind the scenes with detectMultiScale.

After the read, you'll know that a sliding window approach is used and depending on the parameters, you will get multiple areas in your frame which resemble your template. For each such region, a bounding box is actually drawn around it. As the window slides over and discovers more similar features, a bounding box is drawn. By the end of the day, for a single face you might end up having a couple of boxes overlaying one another.

Having multiple boxes represent a single face is redundant so a non-max suppression is applied. In a nutshell, you try finding the average location of the overlapping boxes and encompass them together. Look into this blog to get the general idea. Here's a follow up link.

Depending on the parameters you used for detectMultiScale, the final averaged box's location will differ.