Ask Your Question

Face embeddings calculations dlib theory

asked 2019-03-23 00:53:33 -0500

oja gravatar image

I have used dlibs face embedding for face recognition as a part of my project. Now, I am looking to write a research paper about my project and I can't seem to find any documentation about dlib library's face embedding model. The only stuff I was able to find is that:

   1) It's based on resnet 34 
   2) The model has high efficiency when distance is .6

and face net triplet loss is different from dlib face embedding and even research papers have no mentions (based o few IEEE papers i looked) about embedding calculations. So, my question is that how 128D embedding values are calculated?

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2019-03-24 02:06:54 -0500

phillity gravatar image

updated 2019-03-24 02:24:31 -0500

Please see this webpage --

"The network training started with randomly initialized weights and used a structured metric loss that tries to project all the identities into non-overlapping balls of radius 0.6. The loss is basically a type of pair-wise hinge loss that runs over all pairs in a mini-batch and includes hard-negative mining at the mini-batch level. The training code is obviously also available, since that sort of thing is basically the point of dlib. You can find all details on training and model specifics by reading the example program and consulting the referenced parts of dlib. There is also a Python API for accessing the face recognition model."

Also see the loss function code and documentation here --

"WHAT THIS OBJECT REPRESENTS This object implements the loss layer interface defined above by EXAMPLE_LOSS_LAYER_. In particular, it allows you to learn to map objects into a vector space where objects sharing the same class label are close to each other, while objects with different labels are far apart.

To be specific, it optimizes the following loss function which considers all pairs of objects in a mini-batch and computes a different loss depending on their respective class labels. So if objects A1 and A2 in a mini-batch share the same class label then their contribution to the loss is: max(0, length(A1-A2)-get_distance_threshold() + get_margin())

While if A1 and B1 have different class labels then their contribution to the loss function is: max(0, get_distance_threshold()-length(A1-B1) + get_margin())

Therefore, this loss layer optimizes a version of the hinge loss. Moreover, the loss is trying to make sure that all objects with the same label are within get_distance_threshold() distance of each other. Conversely, if two objects have different labels then they should be more than get_distance_threshold() distance from each other in the learned embedding. So this loss function gives you a natural decision boundary for deciding if two objects are from the same class.

Finally, the loss balances the number of negative pairs relative to the number of positive pairs. Therefore, if there are N pairs that share the same identity in a mini-batch then the algorithm will only include the N worst non-matching pairs in the loss. That is, the algorithm performs hard negative mining on the non-matching pairs. This is important since there are in general way more non-matching pairs than matching pairs. So to avoid imbalance in the loss this kind of hard negative mining is useful."

If not clear, I reccomend reading the DeepFace and FaceNet papers

edit flag offensive delete link more


@phillity thanks I am looking into it

oja gravatar imageoja ( 2019-04-19 12:56:13 -0500 )edit

@oja if you found the answer helpful, please upvote and accept it. Good luck!

phillity gravatar imagephillity ( 2019-04-19 18:54:16 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower


Asked: 2019-03-23 00:53:33 -0500

Seen: 105 times

Last updated: Mar 24