Revision history [back]

Matching images from scientific papers using SIFT: sub image enumeration leads to high false positive rate

I'm searching for duplicates within a group of images which where extracted from scientific papers. The problem is that some of those images have enumerations, i.e. "A", "B", ... which leads to a high false positive rate. I'm using SIFT for detection and description with ratio test, cross check validation and RANSAC for match filtering. But there are still a lot of false positives left.

Here three examples of unwanted matches: example one

Any ideas how I can remove those false duplicates with minimal negative influence regarding the true positive rate?