Revision history [back]

How to merge detected windows ?

So basically I've created my own pedestrian detection algorithm (I need it for some research purposes, thus decided not to use the supplied HoG detector) .

After detection, I'd have many overlapping rectangles around the detected object / human. Then I'd apply non-maxima suppression to retain the local maxima. However there are still overlapping rectangles in location out of search range of the non-maxima suppression algorithm.

How would you merge the rectangles ? I tried to use grouprectangles, but somehow i'm lost about how it came up with the result (e.g. grouprectangles( rects, 1.0, 0.2 ) )

I applied a rudimentary merging algorithm that merge if there are rectangles that overlapped for certain percentage of the area, the code is shown below.

/**
 * Merge a set of rectangles if there's an overlap between each rectangle for more than 
 * specified overlap area
 * @param   boxes a set of rectangles to be merged
 * @param   overlap the minimum area of overlap before 2 rectangles are merged
 * @param   group_threshold only the rectangles that have more than the remaining group_threshold rectangles will be retained
 * @return  a set of merged rectangles
 **/
vector<Rect> Util::mergeRectangles( const vector<Rect>& boxes, float overlap, int group_threshold ) {
    vector<Rect> output;
    vector<Rect> intersected;
    vector< vector<Rect> > partitions;
    vector<Rect> rects( boxes.begin(), boxes.end() );

    while( rects.size() > 0 ) {
        Rect a      = rects[rects.size() - 1];
        int a_area  = a.area();
        rects.pop_back();

        if( partitions.empty() ) {
            vector<Rect> vec;
            vec.push_back( a );
            partitions.push_back( vec );
        }
        else {
            bool merge = false;
            for( int i = 0; i < partitions.size(); i++ ){

                for( int j = 0; j < partitions[i].size(); j++ ) {
                    Rect b = partitions[i][j];
                    int b_area = b.area();

                    Rect intersect = a & b;
                    int intersect_area = intersect.area();

                    if (( a_area == b_area ) && ( intersect_area >= overlap * a_area  ))
                        merge = true;
                    else if (( a_area < b_area ) && ( intersect_area >= overlap * a_area  ) )
                        merge = true;
                    else if (( b_area < a_area ) && ( intersect_area >= overlap * b_area  ) )
                        merge = true;

                    if( merge )
                        break;
                }

                if( merge ) {
                    partitions[i].push_back( a );
                    break;
                }
            }

            if( !merge ) {
                vector<Rect> vec;
                vec.push_back( a );
                partitions.push_back( vec );
            }
        }
    }

    for( int i = 0; i < partitions.size(); i++ ) {
        if( partitions[i].size() <= group_threshold )
            continue;

        Rect merged = partitions[i][0];
        for( int j = 1; j < partitions[i].size(); j++ ) {
            merged |= partitions[i][j];
        }

        output.push_back( merged );

    }

    return output;
}

However what I'd like to now if this is actually an accepted way to merge rectangles in computer vision, especially when I want to check the precision recall of my algorithm. My approach seems to be too simplistic at times, and every merged rectangles get bigger and bigger mainly because of merged |= partitions[i][j]; which finds the minimum rectangle that enclose both rectangles.

If this is an acceptable way to merge detection windows, what's the common value for merging overlap (i.e. if overlap area >= what percentage) ?