Matching images from scientific papers using SIFT: sub image enumeration leads to high false positive rate
I'm searching for duplicates within a group of images which where extracted from scientific papers. The problem is that some of those images have enumerations, i.e. "A", "B", ... which leads to a high false positive rate. I'm using SIFT for detection and description with ratio test, cross check validation and RANSAC for match filtering. But there are still a lot of false positives left.
Here three examples of unwanted matches:
Any ideas how I can remove those false duplicates with minimal negative influence regarding the true positive rate?
imho, you're on the wrong path. feature/descriptor matching is for finding a known thing in a scene, not for distinguishing different things (that would be object recognition)