Ask Your Question

Creating Regions of Interest (ROI) by clustering fragmented contours

asked 2014-02-06 17:50:05 -0500

Will Stewart gravatar image

updated 2020-10-28 02:45:11 -0500

Overall Objective: Create Regions of Interest (ROIs) in order to then examine them for objects such as person, dog, vehicle utilizing the Java bindings

Approach: BackgroundSubtraction -> FindContours -> downselect to Region of Interest (smallest encompassing rectangle around contours of an object) that is then sent to be classified and/or recognized.

Problem: Too many fragmented contours for each object almost all the time. I've tried BackgroundSubtractorMOG and MOG2 with varying parameters (may not have tried the right combinations) along with erode/dilate and findContours(). The contours rarely completely enclose the subject, consisting instead of a number of contours that usually partially map to the subject. On top of that, there are sometime multiple objects (eg., person with dog) moving through the video stream. I am not able to reliable draw a rect around the full object in order to use that smaller window in which to detect features (or classifier, HOG, etc).

(I am addressing the shadow issue in a different thread)

My approach is leaning towards grouping contours that have some measure of nearness, though for people, the vertical elongation can be a complication for nearness calculations.

Question: Is there a method by which the nearness of contours can be evaluated, so as to group them into a larger contour/object?

Are there approaches to solving this problem? Below are images that illustrate the issue under consideration;

Contoured image

Original image

Silhouette image

edit retag flag offensive close merge delete



You could cluster the locations of the contours via agglomerative clustering. But is that really necessary? Imho sth like a cascade classifier should be fast enough to be applied in real time (of course needs much learning time).

Guanta gravatar imageGuanta ( 2014-02-07 13:40:57 -0500 )edit

@Guanta , So it would be doable to use a cascade classifier across the entire image in a 4+ frames per second stream, with multiple cameras at 5MP?

When you said 'sth', I'm not quite sure what you mean - Soft Cascade Classifier?

kmeans clustering seemed to hold promise, but it requires knowing how many clusters exist, and the number of objects moving through the camera FOV isn't known ahead of time.

Will Stewart gravatar imageWill Stewart ( 2014-02-08 22:04:56 -0500 )edit

soft cascade classifier or normal cascade classifier. However multiple cameras w. 5MP are probably too much for one PC, but maybe the drawback of downscaling the images aren't too high, try it... And I meant agglomerative clustering not k-means clustering, in agglomerative clustering you can give a minimum distance a cluster can be merged with the nearest other cluster. Furthermore check out mean-shift / cam-shift! Good luck with your project!

Guanta gravatar imageGuanta ( 2014-02-09 05:48:19 -0500 )edit

@Guanta , your suggestion was most helpful - thank you!

Will Stewart gravatar imageWill Stewart ( 2014-02-17 10:47:14 -0500 )edit

3 answers

Sort by ยป oldest newest most voted

answered 2014-02-17 10:46:17 -0500

Will Stewart gravatar image

updated 2014-03-10 15:35:05 -0500

The way I solved the problem of aggregating contours with a high degree of affinity (e.g., multiple contours that describe a single object) was to implement a partitioned, non-hierarchical agglomerative clustering approach, thanks to the thoughtful suggestion by @Guanta ;

  • Calculate a closeness factor for all contour pairs (distance between centers of contours minus the radii of both contours)

  • Sort the pairs in ascending order (start with the most obvious pairs to cluster first, to avoid having to spend CPU bandwidth in frequent restructuring of clusters)

  • Build clusters of contours based on the closeness (see 4 conditions below).

  • Add clusters for 'solitary' contours, after evaluating for size. With a very good background detection, some objects might only result in one contour (i.e., don't throw the baby out with the bathwater by only looking at contour pairs).

This approach creates partitions of distinct clusters (e.g, contours associated with non-overlapping objects are partitioned into clusters associated with those objects respectively).

The 4 main conditions for clustering the contour pairs are;

  1. Neither are in a cluster yet, so a cluster is created and both are added to it.
  2. Both are already in the same cluster. Nothing else needs to be done for this pair.
  3. One contour is in a cluster, the other is not. The one that is not is appended to the cluster of the one that is.
  4. One contour is in one cluster, the other contour is in a different cluster. One cluster then absorbs the other, which is then removed.

The overall area of a cluster is examined to see if it is greater than the threshold size (we don't want to capture images of birds or cats if we actually want to trigger on people or vehicles).

Since the contours frequently do not completely encompass a moving object, an additional amount of image can be defined (e.g., 25% more on each direction of the rectangle) depending upon a number of tuning factors.

The first picture is the original image. While the black coat against the snow gives a sharp, well-defined edge, the dark mulch gives little contrast, and hence a number of contours for the pedestrian are the result.

The second image shows both the contours that were generated, as well as the cluster that was calculated and drawn as a thin red bounding rectangle.

The third image shows the ROI, which in this case was derived from the cluster outline with an additional 25% additional area on each side of the rectangle.

I've implemented this in Java, so am looking for suggestions about the best way to share this method and code with the OpenCV community.

Image 1: Original Image Original Image

Image 2: Contours with boundingRectangle Contours of pedestrian

Image 3: ROI derived (expanded) from boundingRectangle ROI of pedestrian

The fourth image is of a more distant moving object partially occluded by fence boards, with consequent fragmented contours. The agglomerative clustering was performed, resulting in the boundingRectangle (thin red rectangle).

The fifth image shows how the boundingRectangle was used ... (more)

edit flag offensive delete link more



Nice work, don't take my word for granted but the best method for you to share is probably the Github code repository, which is the same that has an official mirror of OpenCV.

Rui Marques gravatar imageRui Marques ( 2014-02-25 14:46:06 -0500 )edit

Will, I'd like to talk to you about this algorithm you implemented, to see if it could work for a problem I have in hands, could you please contact me at [email protected] ? Best regards

Pedro Batista gravatar imagePedro Batista ( 2014-03-05 08:55:23 -0500 )edit

Hi Will, I'm trying to figure out a good way to improve my Motion Detection and was thinking about clustering the contours (as they are often fragmented / split in between like in your example). Your proposed method looks very promising but I am a little stuck on how to implement the actual creation of clusters efficiently (sorting & saving the necessary elements.. especially how to "mark" already clustered contours). Could you give me (a few lines of) code on how to do this? I know this is quite an old answer but maybe you can still help me out ^^ Thanks.

Appuru gravatar imageAppuru ( 2015-03-09 04:45:47 -0500 )edit

answered 2017-03-02 10:53:04 -0500

Steven P. Goldsmith gravatar image

updated 2018-06-11 19:14:03 -0500

I know this thread is old, but it's a common problem solved by I've written something similar that will return multiple ROI based on moving average motion detection

image description

edit flag offensive delete link more


Thanks Steven!

Will Stewart gravatar imageWill Stewart ( 2017-12-18 09:05:04 -0500 )edit

Thanks Steven! My effort was Java-based, so Latent SVM (my first choice) was unfortunately not available in OpenCV Java libraries, so I took a different route. Your second link appears to be broken at this time though I did find and P. Goldsmith

Will Stewart gravatar imageWill Stewart ( 2018-01-05 10:32:01 -0500 )edit

I corrected the link above. has Java code to do moving average, MOG2 and pedestrian detection.

Steven P. Goldsmith gravatar imageSteven P. Goldsmith ( 2018-06-11 19:17:31 -0500 )edit

answered 2014-05-21 09:08:49 -0500

PedroPadua gravatar image

updated 2014-05-21 18:30:38 -0500

Edit: I managed to develop my own code based on your approach. But it doesn't handle multiple objects: if multiple contours are created in many parts of the image, my algorithm only creates a really large ROI, not multiples ones. Do you have any idea how could I include the nearness information to merge only the neineighborhood contours?

Thank you!

Hi @Will Stewart. Really nice work!! Have you shared your code yet? I'm really interested in it. I'm working in a master degree thesis where I'm trying to track indoor soccer players, using background subtraction for detection. Unfortunately, I'm not able to merge the numerous contours that a player has into only one bounding box. Your solution seems really promising. Could you give me more details about it, or share you Java code with me? My e-mail is: pedhenrique at gmail dot com. I'd be really grateful. Cheers!

edit flag offensive delete link more


Sorry for the delay, have been away on other matters.

The non-hierarchical approach means that there should be an agglomeration for each set of contours that passes the size and nearness tests. However, overlapping (or very close) moving images will not be tracked as separate objects (e.g., person walking dog on short leash, a close group of pedestrians, etc).

The nearness value is the key; requiring contours to be very close means that a 'sparse' set of contours is not fully combined. Set too far, it could combine contours from non-related objects.

My main goal for this was to identify an ROI, then apply other features to classify, track, identify, etc. For example, I would use detectMultiScale().

I am not where I could access the code right now, but should be within the week.

Will Stewart gravatar imageWill Stewart ( 2014-06-26 12:32:35 -0500 )edit

Hi there! I have been working on combining of the cluster problem. Can you please share the code for this problem? My email id is [email protected]

Chaitanya gravatar imageChaitanya ( 2015-05-08 03:09:47 -0500 )edit

Hi there! I am working on an android project for vehicle detection and would really need your code to make this a reality. could you please share with me @

waleh gravatar imagewaleh ( 2016-05-16 05:28:05 -0500 )edit

Question Tools



Asked: 2014-02-06 17:50:05 -0500

Seen: 17,682 times

Last updated: Jun 11 '18