I'm searching for duplicates within a group of images which where extracted from scientific papers. The problem is that some of those images have enumerations, i.e. "A", "B", ... which leads to a high false positive rate. I'm using SIFT for detection and description with ratio test, cross check validation and RANSAC for match filtering. But there are still a lot of false positives left.
Here three examples of unwanted matches:
Any ideas how I can remove those false duplicates with minimal negative influence regarding the true positive rate?