Revision history [back]

Here is a rough sketch of a system that you can consider:

1) First you should try to decide which features to use to represent each face e.g. Local Binary Pattern (LBP), Fisherface, etc.

2) Detect faces in all the images in your database if they are not already face images.

3) Preprocess the face images appropriately. This involves resizing the images to the same size, converting them to the same colorspace (e.g. grayscale), face alignment etc.

4) For each of the preprocessed face images, compute the feature vector to represent it. In this step, you might want to use a data structure (e.g. a dictionary) to map each feature vector (likely an index of the array it is stored) to the image the face is found in.

5) At this stage, each of the faces found in the images in your database is represented by a vector. A direct approach would be to convert the faces in your input image to feature vectors and compare them to every feature vector of the faces in your database and look for the K nearest ones using Euclidean distance (or any other distance measure). This might be what Philipp has in mind in his answer. You can then map those vectors to the image they appear in using the data structure in step 4.

6) There are a number of plausible methods to "improve" on the matching process outlined in step 5. One direct way would be to use Approximate Nearest Neighbor (ANN) matching instead of direct matching. OpenCV has an interface for an ANN implementation which you can use.

Another approach is to use Locality Sensitive Hashing (LSH) where you "hash" each of your input face vectors to find its nearest neighbors. I'm a bit fuzzy about the details for this one so I cannot help much but you can probably find tutorials on LSH easily.

A reason for using methods such as ANN and LSH is that they will speed up your matching process considerably. They also help alleviate the "Curse of Dimesionality" problem, which arises when you try to compare vectors of high dimensions directly, and might give better matches. In case you find it counter-intuitive that approximate methods might be more accurate than direct comparisons you are in good company :).

I have only outlined a rough sketch of one possible system. No doubt there are many other possible approaches. Hope the above is useful to you :).