Matching images from scientific papers using SIFT: sub image enumeration leads to high false positive rate

asked 2016-05-28 13:22:51 -0600

user3403849 gravatar image

I'm searching for duplicates within a group of images which where extracted from scientific papers. The problem is that some of those images have enumerations, i.e. "A", "B", ... which leads to a high false positive rate. I'm using SIFT for detection and description with ratio test, cross check validation and RANSAC for match filtering. But there are still a lot of false positives left.

Here three examples of unwanted matches: example one example one example one

Any ideas how I can remove those false duplicates with minimal negative influence regarding the true positive rate?

edit retag flag offensive close merge delete

Comments

imho, you're on the wrong path. feature/descriptor matching is for finding a known thing in a scene, not for distinguishing different things (that would be object recognition)

berak gravatar imageberak ( 2016-05-29 02:29:25 -0600 )edit