Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

De-duplicate images of faces

I have a group of images of faces (.jpg and .png). Some of the images are taken from the same source but have been cropped, rotated, flipped, resized, compressed, color modified, brightened, darkened, etc., and so are not exact duplicates. So the freeware I currently use will not find these 'transformed' image near-duplicates.

Since the images are 'almost,' but not quite, identical, I hope facial recognition software can help identify the duplicates. Also, I need a system that, once each face has been compared with each of the others, can tell me which files match so that I can compare the files and decide which one of the pair to keep and which one to discard.

I'm not a programmer, but my son can do the programming. I am trying to find possible tools he can use to implement a solution. (He says tools that can be integrated with Linux would be the easiest but that he can work with most anything.)

I've read the facerec_tutorial but don't really understand it. Can the opencv facial recognition software find the matches and then output some sort of data that can be used for human comparison of the files? If not, what tools would he need to use in order to take the output from the recognition software and make it useable so that a human could examine the images?

De-duplicate images of faces

I have a group of images of faces (.jpg and .png). Some of the images are taken from the same source but have been cropped, rotated, flipped, resized, compressed, color modified, brightened, darkened, etc., and so are not exact duplicates. So the freeware I currently use will not find these 'transformed' image near-duplicates.

Since the images are 'almost,' but not quite, identical, I hope facial recognition software can help identify the duplicates. Also, I need a system that, once each face has been compared with each of the others, can tell me which files match so that I can compare the files and decide which one of the pair to keep and which one to discard.

I'm not a programmer, but my son can do the programming. I am trying to find possible tools he can use to implement a solution. (He says tools that can be integrated with Linux would be the easiest but that he can work with most anything.)

I've read the facerec_tutorial but don't really understand it. Can the opencv facial recognition software find the matches and then output some sort of data that can be used for human comparison of the files? If not, what tools would he need to use in order to take the output from the recognition software and make it useable so that a human could examine the images?

Ah, being a newbie, I can't yet answer my own question, so, in answer to berak's reply below, there are about 150,000 images that need to be deduplicated.

De-duplicate images of faces

I have a group of images of faces (.jpg and .png). Some of the images are taken from the same source but have been cropped, rotated, flipped, resized, compressed, color modified, brightened, darkened, etc., and so are not exact duplicates. So the freeware I currently use will not find these 'transformed' image near-duplicates.near-duplicate images.

Since the images are 'almost,' but not quite, identical, I hope facial recognition software can help identify the duplicates. Also, I need a system that, once each face has been compared with each of the others, can tell me which files match so that I can compare the files and decide which one of the pair to keep and which one to discard.

I'm not a programmer, but my son can do the programming. I am trying to find possible tools he can use to implement a solution. (He says tools that can be integrated with Linux would be the easiest but that he can work with most anything.)

I've read the facerec_tutorial but don't really understand it. Can the opencv facial recognition software find the matches and then output some sort of data that can be used for human comparison of the files? If not, what tools would he need to use in order to take the output from the recognition software and make it useable so that a human could examine the images?

Ah, being a newbie, I can't yet answer my own question, so, in answer to berak's reply below, there are about 150,000 images that need to be deduplicated.