eye landmark points
Hi
I'm using facial landmark detector (dlib) to detect eye blinks . How the eye landmarks can be imported to a file ?
I need to use eye landmarks to calculate the ration between height and width of eye and to use SVM to classify blinks
Update : when I try to write landmark point to a file , different valuses are saved than the displayed landmarks in terminal windows , how to fix ?
Thanks
#include <dlib/opencv.h>
#include <opencv2/highgui/highgui.hpp>
#include <dlib/image_processing/frontal_face_detector.h>
#include <dlib/image_processing/render_face_detections.h>
#include <dlib/image_processing.h>
#include <dlib/gui_widgets.h>
using namespace dlib;
using namespace std;
int main()
{
try
{
cv::VideoCapture cap(0);
if (!cap.isOpened())
{
cerr << "Unable to connect to camera" << endl;
return 1;
}
image_window win;
frontal_face_detector detector = get_frontal_face_detector();
shape_predictor pose_model;
deserialize("shape_predictor_68_face_landmarks.dat") >> pose_model;
while(!win.is_closed())
{
cv::Mat temp;
cap >> temp;
cv_image<bgr_pixel> cimg(temp);
// Detect faces
std::vector<rectangle> faces = detector(cimg);
// Find the pose of each face.
std::vector<full_object_detection> shapes;
ofstream outputfile;
outputfile.open("data1.csv");
for (unsigned long i = 0; i < faces.size(); ++i)
{
full_object_detection shape = pose_model(cimg, faces[i]);
cout << "number of parts: "<< shape.num_parts() << endl;
cout << "Eye Landmark points for right eye : "<< endl;
cout << "pixel position of 36 part: " << shape.part(36) << endl;
cout << "pixel position of 37 part: " << shape.part(37) << endl;
cout << "pixel position of 38 part: " << shape.part(38) << endl;
cout << "pixel position of 39 part: " << shape.part(39) << endl;
cout << "pixel position of 40 part: " << shape.part(40) << endl;
cout << "pixel position of 41 part: " << shape.part(41) << endl;
cout << endl;
cout << "Eye Landmark points for left eye : "<< endl;
cout << "pixel position of 42 part: " << shape.part(42) << endl;
cout << "pixel position of 43 part: " << shape.part(43) << endl;
cout << "pixel position of 44 part: " << shape.part(44) << endl;
cout << "pixel position of 45 part: " << shape.part(45) << endl;
cout << "pixel position of 46 part: " << shape.part(46) << endl;
cout << "pixel position of 47 part: " << shape.part(47) << endl;
double P37_41_x = shape.part(37).x() - shape.part(41).x();
double P37_41_y= shape.part(37).y() -shape.part(41).y() ;
double p37_41_sqrt=sqrt((P37_41_x * P37_41_x) + (P37_41_y * P37_41_y));
double P38_40_x = shape.part(38).x() - shape.part(40).x();
double P38_40_y = shape.part(38).y() - shape.part(40).y();
double p38_40_sqrt=sqrt((P38_40_x * P38_40_x) + (P38_40_y * P38_40_y));
double P36_39_x = shape.part(36).x() - shape.part(39).x();
double P36_39_y = shape.part(36).y() - shape.part(39).y();
double p36_39_sqrt=sqrt((P36_39_x * P36_39_x) + (P36_39_y * P36_39_y));
double EAR= p37_41_sqrt + p38_40_sqrt/2* p36_39_sqrt;
cout << "EAR value = " << EAR << endl;
shapes.push_back(pose_model(cimg, faces[i]));
const full_object_detection& d = shapes[0];
}
win.clear_overlay();
win.set_image(cimg);
win.add_overlay(render_face_detections(shapes));
}
}
catch(serialization_error& e)
{
cout << "You need dlib's default face landmarking model file to run this example." << endl;
cout << "You can get it from the following URL: " << endl;
cout << " http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2" << endl;
cout << endl << e.what() << endl;
}
catch(exception& e)
{
cout << e.what() << endl;
}
}
did you actually try it ?
i don't think, that dlib's landmarks will deliver significant enough differences for open/closed eyes (but maybe i'm wrong here)
you'll still need the landmarks to detect the eye position, but imho, you'll need some cropped, open/close dataset of images to train on.
Hi , This paper : http://vision.fe.uni-lj.si/cvww2016/p... used facial landmarks to
detect eyes and then the eye aspect ratio (EAR) between height and width of the eye is computed.
in the end, you just need to save your EAR ratio (a single float) plus a "label" (open/closed), right ?
(i'm curious, how that'll work - training an SVM on a single feature)
It will only work if the classes are seperable. However, in this case I would go for a Normal Bayes classifier or a KNN classifier, who do way better in low dimensional data.
thanks @StevenPuttemans for your suggestion , is that means I should use Normal Bayes classifier or a KNN classifier on the obtained calcuaitng eye aspect ratio (EAR) ?
currently, you're saving all landmarks, but only printing out the eye-ones.
why don't you calculate your EAR right there, and save that ?
@berak Thanks for suggesting this , I will calculate EAR equation , but is ||p2-p6|| means Euclidian distance ? any suggestion of how can be calculated
yes, euclidean distance (L2 norm)
I have edited the question and included EAR calculation equation , is it correct ?
imho, you're missing braces here: (2.1 (1) in paper)
also, don't forget the other eye ! ("Since eye blinking is performed by both eyes synchronously, the EAR of both eyes is averaged")
(btw, just curious, what kind of (labelled??) data do you you have for this ?)
In the paper , it says ,
A linear SVM classier (called EAR SVM) is trained from manually annotated sequences. Positive examples are collected as ground-truth blinks, while the negatives are those that are sampled from parts of the videos where no blink occurs,
I have video data with annotations , but I don't have idea of how to make a classifier for EAR using it , do you have any suggestions ?
... look at the answer below, again ?
hmm, i don't quite understand your problem, as you have anything you need.
is it "reading video" ? you'd process frame by frame, and save ear value and label.
Yes , in processing video frames of the annotated video which has .tag ( blinks ) and .txt ( frames ). I got (EAR values computed for each frame in an annotated video sequence).
but , Now how I find a peak of an annotated blink, I don't know how to deal with the annotated video files for example in blink8 it is annotated by start and end of a blink . So the peak is probably a center of this interval. E. g. blink starts in a 38th frame and ends at 42th frame, so the blink peak is in the 40th frame of a sequence. after that take EAR values from 34th-46th frames = 13 scalar numbers and these numbers are 1 positive feature for training SVM.
oh, had to read the paper, again, to understand, what you mean...
i think, you need some kind of "ring buffer" here (eg.
std::deque<float>(13)
) . for each frame, push the current ear into one end, and pop the oldest at the other. then use the whole 13 elem vector as a feature for frame t-6 in the svm (write a label and all 13 ear's to csv)(and still, for training, i'd try to do this seperately for each eye (not interpolated), to get more training data, but interpolate in the prediction phase)
To TEST: In a sliding-window fashion, for each frame in a video, take surrounding 13 EAR values and asked SVM classifier if these values are positive or negative. If positive, it means that the tested frame (in a center of 13 frames) is blink or not. In the annotated videos .tag and .txt shows the eye states and frame numbers LInk . but i'm very confused on how can I combine extracted EAR values with the annotated data
can you give a link to the data you're using ?
here is the link to it , it is annotated data . http://www2.fiit.stuba.sk/~fogelton/a...