Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Hand Posture Recognition using Machine Learning

Hey guys,

I am currently working on my thesis and for that I am trying to recognize different hand postures for controlling an embedded system. My first implementation used simple skin color segmentation, calculation of contours, convex hull, defects and so on. This works, but I am not satisfied, since it is not very robust. This approach also detects other skin colored objetcs and I don't want to compute all these things every frame.

Because of that I started to think about machine learning algorithms. I thought about using a Support Vector Machine for my task. My problem is that I don't really know what to use. Is a SVM a good choice for hand posture recognition or is another machine learning algorithm more appropriate? I am also unsure about what features to use for training my model. Currently I use SURF-Features, since SIFT is to slow. Are other features more suitable? And my last problem is the training set. What kind of images do I need to train my model?

So my questions are:

  • Is a Support Vector Machine a good choice for hand posture recognition?

  • What are good features to train the model?

  • What kind of images do I need for my training? How many for each class? Grayscale, Binary, Edges?

I know that these are no specific questions, but there is so many literature out there and I need some advide where to look at.

I am working with the OpenCV Python Bindings for my image processing and use the scikit package for the machine learning part.

Best Regards Missing

Hand Posture Recognition using Machine Learning

Hey guys,

I am currently working on my thesis and for that I am trying to recognize different hand postures for controlling an embedded system. My first implementation used simple skin color segmentation, calculation of contours, convex hull, defects and so on. This works, but I am not satisfied, since it is not very robust. This approach also detects other skin colored objetcs and I don't want to compute all these things every frame.

Because of that I started to think about machine learning algorithms. I thought about using a Support Vector Machine for my task. My problem is that I don't really know what to use. Is a SVM a good choice for hand posture recognition or is another machine learning algorithm more appropriate? I am also unsure about what features to use for training my model. Currently I use SURF-Features, since SIFT is to slow. Are other features more suitable? And my last problem is the training set. What kind of images do I need to train my model?

So my questions are:

  • Is a Support Vector Machine a good choice for hand posture recognition?

  • What are good features to train the model?

  • What kind of images do I need for my training? How many for each class? Grayscale, Binary, Edges?

I know that these are no specific questions, but there is so many literature out there and I need some advide where to look at.

I am working with the OpenCV Python Bindings for my image processing and use the scikit package for the machine learning part.

EDIT

@Pedro Batista, first of all thank you very much for your detailed answer, I really appreciate that. The system should run in a laboratory environment. So the user has to interact with different devices and should be able to control some of these devices by hand postures/gestures. The background might be stable, but it is no simple white/black background. For the moment I assume that the user places his hand close in front of the camera.

Yesterday I made a minimal example with a SVM. I took some sample images of three different hand postures (Open Hand, Fist and two fingers). I took only 20 images for every posture of size 320*240. The size and distance of the hand was nearly the same in every image.

After that I segmented the hand by simple threshold in the YCrCb color space and performed some smoothing and opening/closing operations. By calculating the biggest contour (which I assume is the hand) I got two features, the area of the contour and the perimeter. Calculating more features should be no problem, like convexity defects, angles and so on. I used these two features to train my SVM and got the following classification (area on the x axis and perimeter on the y axis).

image description

So in this case the simple classification works quite well, but at the moment I just worked with an idealized situation. For the moment I'm not sure how this will perform in a live video feed. I will try k nearest neighbours next week. But I would be very intereseted in more information and thoughts from you. So if you could talk a little more about this approach I would appreciate it.

Best Regards Missing