offtopic: facenet model phenomenon [closed]
Hello i hope the following cnn/dnn questions are not too off topic:
Disclaimer:
- I read about siamese networks, triplet(loss) and understand that facenet is a model architecture.
- I also understand that the model openface.nn4.small2.v1.t7 used by the open cv demo is an implementation of the facnet architecture
I picked as an anchor image(every following pictures will be compared with this) of arnold schwarzenegger, age 40 from the front. During testing i noticed the following:
- Slightly Different angles of the same person leads to a very low similarity score
- Different ages of the same person leads to a very low similarity score
- I picked an image of an old woman, a young woman(famous "lena" picture) and a baby To my suprise these image(which are clearly a different person than arni) are closers to the anchor than the pictures of the same person(different age / angel).
I am afraid i can not ask the net why it thinks lena is more similar to arnold than a picture of himself at a slightly different age. Can anyone with decent knowledge comment on this "phenomenon" ?
On the other hand, as long as the network detects the same person when having a high score (0.8 seems to be good), and i can confirm this, is all fine?
Maybe this all is just "network magic"? Any comments on this is highly welcome, maybe i should also try the tensorflow implementation of the facenet.
Greetings, Holger
could you add, how you calculate the "similarity score" ?
Yes - of course - i made sure the detection itself works correctly (the code you provided) and took this as a template https://github.com/opencv/opencv/blob...
My code (to big to paste it full)
I could upload the full code (80 lines) to github
Hmm you give me an idea - maybe i shouldnt do it all together but extract the faces first and write them to disk and take this as input to further analyse. Will do this.
Ok if i understand you correctly(please comment):
Did i got you right? I hope some because then i can "fix" this. Thank you, Holger
no, sorry, the dot product is ok (since the embeddings are L2 normalized, i forgot that)
ok then - i am not complaining about facenet at all. The positives are valid and ok (score > 0.75). I am just wondering about the negatives and their probability. From a human perspective its really stupid.
How can lena have higher prob than a picture of arnold himself. If a human would take a look he could clearly tell that lena (beautiful woman) is obviously not more similar than a picture of an older arnold.
Anyway - maybe this leads to nothing and should just accept the fact that the models contain some "magiic" it learned from pixels and is not a human.
Ok then - converting it to a 3 channel black and white image produces more reasonable result for my little dataset
Interesting. My theory is that this way i force the net to pay more attention on face landmarks. Only a theory - i need to verify on a bigger dataset (LFW dataset). But maybe i just f* up something this way, i will need to measure.
hehe, right. it's also a knpwn fact, that you can fool cnn's e.g. with adversarial noise
well i guess a neuronal network just "sees" things just differently and is maybe right in its own world. Anyway - interesting discussion but will close it.
Thank you for your input! Greetings, Holger