Facial recognition on a large scale
We got 80,000+ faces consisting of approximately 20,000 individual people with more faces being added on a daily basis.
All faces are taken by professional photographers as studio portraits or at gala dinners, shows, scenic tours etc. and is therefore not always frontal shots, evenly aligned etc. The faces range from babies to old wrinkly double chinned ones with the majority of people being over 40.
The purpose of the project/application is to find all faces that match each other i.e. does face 15 match face 3765 and 1974 or.. so a reference can be saved in a db. The facial detection is already done and is not a problem but we struggle a fair bit with getting good consistent results from the facial recognition. Most examples/articles are in the form of access control i.e. where an incoming face is compared against a db of known people and where a person only occurs once in the db. This is different since there are no known people, no source photos to train on etc. Just a big melting pot of faces.
Scenario:
- A picture is taken at an event
- All faces in the picture are detected and saved as individual files
- For each face found the great pot of faces(80,000+) is scanned to find associated faces
- A reference is saved in the db so all faces for a given person can be found later on
We have tried several scenarios but have yet to find one that delivers a consistent prediction. Any good idea how to go about this. Not sure what to train the Recognizer on.
Regards
Rune
so, they are all unlabelled ?
imho, you described your situation quite nicely, still i'm unsure, what you're trying to achieve in the end.
They all have a unique id and nothing else. What we need to find out is how they relate to each other i.e. if face 15 matches face 2003, 33876... or not
I think what you are looking for is clustering techniques based on descriptors of of each face image instead of using the faceRecognizer interface which is indeed for database comparison. Basically you want to cluster similar descriptors and hope that they are 1 and the same person. Finding a correct descriptor of the person will be the most difficult task here.
no, you can't use opencv's face-recognizer api for this. (it needs several labelled images of the same person to train, you have not got those (yet)). also, i don't even dare to imagine, how it will explode, when you try to take a pca from 80000 images ;)
"unsupervised face clustering" might be a good search term.