Ask Your Question

Revision history [back]

The mean shift algorithm has been laid out in D. Comanciu and P. Meer, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 24 (2002) pp. 603. The functions found in the cv::cuda module implement the variant of the algorithm with the joint domain representation of the pixel feature space. The joint domain representation means that each pixel is represented by five features, including three color coordinates and two spatial coordinates. As a result of the mean shift algorithm, each pixel is assigned a point in the 5D feature space corresponding to the location of the closest local maximum of pixel density distribution (a.k.a mode of the distribution).

In the context of image segmentation, the mean shift procedure in the joint feature space has at least two advantages over the variant that uses only the color domain:

  • Small local features of the color distribution are not trumped by features that dominate the overall image.
  • Spatially separate segments of similar average colors are recognized as separate segments.

The dstsp matrix maps each pixel of the original image to the (approximate) location of the mode it belongs to, projected onto the spatial part of the joint feature space. All pixels belonging to the same segment are mapped to approximately the same location, within the spatial tolerance given by the input parameter sp.

Similarly, the dstr matrix maps each pixel to the approximate location of its mode in the color space.

The dstsp and the dstr matrices are jointly used in the meanShiftSegmentation() function to cluster pixels into segments. Beyond this, the dstsp matrix need not have any special meaning. The use of the spatial coordinates in the mean shift algorithm simply prevents mixing of spatially separate modes.