Ask Your Question

Revision history [back]

Scale invariance

I'm working on a project that does a lot of bounding box detection. I'm able to detect the objects I'm looking for reasonably reliably using a combination of a bunch of transformations. However, if I change the scale of the input image it tends to throw everything off. So to find the largest contour I might scale the image down and do this:

bw_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred_image = cv2.GaussianBlur(bw_image, (3, 3), 0)
canny_image = cv2.Canny(blurred_image, 50, 150)
kernel = numpy.ones((5, 5), numpy.uint8)
dilated_image = cv2.dilate(canny_image, kernel, iterations=1)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9, 9))
closed_image = cv2.morphologyEx(dilated_image, cv2.MORPH_CLOSE, kernel)
blurred_image = cv2.GaussianBlur(closed_image, (3, 3), 0)
img, contours, hierarchy = cv2.findContours(closed_image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
return max(contours, key=cv2.contourArea)

This works fine when the image is scaled, but doesn't work at all using the full image. To get the full image to work I end up changing a lot of the parameters through trial and error, which of course throws off the detection in the scaled image. I'm just looking for some general advice. Do people usually try to use images of the same size/resolution when performing tasks like this? Are there other techniques that would achieve better results when dealing with images of different scales?