How to Detect Speaker from facial landmarks of mouth using face_recognition

asked 2019-11-15 04:16:28 -0500

updated 2019-11-15 04:39:47 -0500

berak gravatar image

I am trying to find a speaker from a webcam using facial land marks which i can get using the face_recognition library. I am successful in getting the month top lip and bottom lip points.

image description

I want to calculate the distance b/w these points and according to distance may be we can say person is speaking or not. What i had done so far now.

import face_recognition
import cv2
import math


video_capture = cv2.VideoCapture(0)
while True:
    # Grab a single frame of video
    ret, frame = video_capture.read()

    face_landmarks = face_recognition.face_landmarks(frame)
    try:
        p1=face_landmarks[0]['top_lip']
        p2=face_landmarks[0]['bottom_lip']
        x1,y1=p1[9]
        x3,y3=p1[8]
        x4,y4=p1[10]
        x2,y2=p2[9]
        x5,y5=p2[8]
        x6,y6=p2[10]
        dist = math.sqrt(((x2+x5+x6) - (x1+x3+x4)) ** 2 + ((y2+y5+y6) - (y1+y3+y4)) ** 2)
        print(dist)
        image = cv2.circle(frame, p1[8], 1, (255, 255, 255, 0), 2)
        image = cv2.circle(frame, p1[9], 1, (255, 255, 255, 0), 2)
        image = cv2.circle(frame, p1[10], 1, (255, 255, 255, 0), 2)

        image = cv2.circle(frame, p2[8], 1, (255, 255, 255, 0), 2)
        image = cv2.circle(frame, p2[9], 1, (255, 255, 255, 0), 2)
        image = cv2.circle(frame, p2[10], 1, (255, 255, 255, 0), 2)
        # # cv2.clipLine(frame, p1, p2,(255,255,255,0), thickness=2)
        # for p1t in p1:
        #     image = cv2.circle(frame, p1t, 1, (255,255,255,0), 2)
        # for p1b in p2:
        #     image = cv2.circle(frame, p1b, 1, (255, 255, 255, 0), 2)

        cv2.namedWindow('Video', cv2.WINDOW_NORMAL)
        cv2.imshow('Video', frame)
    except Exception as e:
        raise(e)

    # Hit 'q' on the keyboard to quit!
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break



video_capture.release()
cv2.destroyAllWindows()

but the distance which i had calculated is varying even if person don't speak.If anyone has idea that how i can detect speaker using month lands marks then please let me know. Thanks

edit retag flag offensive close merge delete

Comments

  • can you explain your distance formula ? (it looks pretty weird)
  • what happens before the (dlib) landmarks extraction ? (we can't see from the code you show) its probably quite noisy
berak gravatar imageberak ( 2019-11-15 04:34:18 -0500 )edit
1
LBerger gravatar imageLBerger ( 2019-11-15 04:39:33 -0500 )edit

math.sqrt((x2-x1) * * 2+(y2+y1) * * 2) that's the simple formula.

Hassan Ali gravatar imageHassan Ali ( 2019-11-15 04:48:21 -0500 )edit

if at all: math.sqrt((x2-x1)**2 + (y2-y1)**2)

but that's not, what your code is doing.

then, there will be always some distance between the landmarks. to find out, if someone is moving the mouth, you'll need to make a time-series from the distances. (and maybe make some primitive frequency analysis)

berak gravatar imageberak ( 2019-11-15 05:02:04 -0500 )edit

you probably need to add up 3 distances (3 point pairs), not what you do now, for sure.

berak gravatar imageberak ( 2019-11-15 06:10:18 -0500 )edit