Revision history [back]

Usually, when detecting various objects, you have a clustering step (mean shift for example) after key points detection. The key points close in the image are supposed to belong to the same object. Then you can execute the rest of your pipeline independently on each cluster.

For a more detailed approach you can take a look at : Object Recognition and Full Pose Registration from a Single Image for Robotic Manipulation, A. Collet.

At the section : "Pose estimation of multiple instances"

Hope this helps,

Guido