# How to Detect Speaker from facial landmarks of mouth using face_recognition

I am trying to find a speaker from a webcam using facial land marks which i can get using the face_recognition library. I am successful in getting the month top lip and bottom lip points. I want to calculate the distance b/w these points and according to distance may be we can say person is speaking or not. What i had done so far now.

import face_recognition
import cv2
import math

video_capture = cv2.VideoCapture(0)
while True:
# Grab a single frame of video

face_landmarks = face_recognition.face_landmarks(frame)
try:
p1=face_landmarks['top_lip']
p2=face_landmarks['bottom_lip']
x1,y1=p1
x3,y3=p1
x4,y4=p1
x2,y2=p2
x5,y5=p2
x6,y6=p2
dist = math.sqrt(((x2+x5+x6) - (x1+x3+x4)) ** 2 + ((y2+y5+y6) - (y1+y3+y4)) ** 2)
print(dist)
image = cv2.circle(frame, p1, 1, (255, 255, 255, 0), 2)
image = cv2.circle(frame, p1, 1, (255, 255, 255, 0), 2)
image = cv2.circle(frame, p1, 1, (255, 255, 255, 0), 2)

image = cv2.circle(frame, p2, 1, (255, 255, 255, 0), 2)
image = cv2.circle(frame, p2, 1, (255, 255, 255, 0), 2)
image = cv2.circle(frame, p2, 1, (255, 255, 255, 0), 2)
# # cv2.clipLine(frame, p1, p2,(255,255,255,0), thickness=2)
# for p1t in p1:
#     image = cv2.circle(frame, p1t, 1, (255,255,255,0), 2)
# for p1b in p2:
#     image = cv2.circle(frame, p1b, 1, (255, 255, 255, 0), 2)

cv2.namedWindow('Video', cv2.WINDOW_NORMAL)
cv2.imshow('Video', frame)
except Exception as e:
raise(e)

# Hit 'q' on the keyboard to quit!
if cv2.waitKey(1) & 0xFF == ord('q'):
break

video_capture.release()
cv2.destroyAllWindows()


but the distance which i had calculated is varying even if person don't speak.If anyone has idea that how i can detect speaker using month lands marks then please let me know. Thanks

edit retag close merge delete

• can you explain your distance formula ? (it looks pretty weird)
• what happens before the (dlib) landmarks extraction ? (we can't see from the code you show) its probably quite noisy
1

math.sqrt((x2-x1) * * 2+(y2+y1) * * 2) that's the simple formula.

if at all: math.sqrt((x2-x1)**2 + (y2-y1)**2)

but that's not, what your code is doing.

then, there will be always some distance between the landmarks. to find out, if someone is moving the mouth, you'll need to make a time-series from the distances. (and maybe make some primitive frequency analysis)

you probably need to add up 3 distances (3 point pairs), not what you do now, for sure.