Revision history - OpenCV Q&A Forum

As I can tell there are two different problems here.

The first is about hand segmentation, and for us to help with this you should provide more information about the algorithm's working environment (luminosity, noise, background, camera position in relation to hand, tracked hand size in relation to image, etc). I'll go on assuming that skin-colour segmentation is working well.

Then there is gesture classification.

SURF provides useful information about edge orientation and other things, but there is really not much point in using it when you are already able to segment the hand from the rest of the image.

When deciding a ML approach, you should start to consider the simplest and fastest methods first, even more so when you are working on an embedded system. SVM works good when data to be classified can be separated through a linear plane, unless we talk about non-linear SVM in which the data may be separated by curved planes, so there is no indication that it would carry any advantage in solving this problem, but the only way to be sure is to try.

When you have to solve a computer vision problem, the first thing you should do is try to deeply think about it and describe it, so, lets to that:

What is a gesture? What may differentiate two different gestures in a hand? Here are the first things that pop to my mind.

Number of fingers showing
Holes in hand (like when some one touches the thumb with index finger)
How much convexity defects
Angle between hand and arm?

If the distance from hand to camera is always roughly the same, or you can compute that distance you can even throw in features like:

area (fist has smaller area than wide-spread hand)
form factor (relationship between perimeter and area)
Solidity (relationship between blob area and bounding box area)
etc

So, once you realize what kind of information you need, you can go on to think about what image processing methods would work to obtain such information. In your scenario you already have segmented the hand, which is very good. Once you use findCountorus() on the segmented image you can obtain a lot of properties mentioned, and with the convexity defects you can already obtain the number of fingers and other information So, the trick is to add as much relevant information as possible and use it to decide which gesture the hand is making.

After making these considerations you can start to think if ML is a good way to go. For this problem I would think about k Nearest Neighbours, which is a very simple ML algorithm that has an OpenCV implementation, it is light so it can work in embedded systems, works well when number of features is in the realm of the dozens, and for simpler problems will largely outperform more complicated ML-engines.

This answer is already too long so I wont go into a kNN explanation. I can do it in another answer if you are interested.

As I can tell there are two different problems here.

The first is about hand segmentation, and for us to help with this you should provide more information about the algorithm's working environment (luminosity, noise, background, camera position in relation to hand, tracked hand size in relation to image, etc). I'll go on assuming that skin-colour segmentation is working well.

Then there is gesture classification.

SURF provides useful information about edge orientation and other things, but there is really not much point in using it when you are already able to segment the hand from the rest of the image.

When deciding a ML approach, you should start to consider the simplest and fastest methods first, even more so when you are working on an embedded system. SVM works good when data to be classified can be separated through a linear plane, unless we talk about non-linear SVM in which the data may be separated by curved ~~planes, so~~ planes. SVM is not a light ML algorithm, and there is no indication that it would carry any advantage in solving this problem, but the only way to be sure is to try.

When you have to solve a computer vision problem, the first thing you should do is try to deeply think about it and describe it, so, lets to that:

What is a gesture? What may differentiate two different gestures in a hand? Here are the first things that pop to my mind.

Number of fingers showing
Holes in hand (like when some one touches the thumb with index finger)
How much convexity defects
Angle between hand and arm?

If the distance from hand to camera is always roughly the same, or you can compute that distance you can even throw in features like:

area (fist has smaller area than wide-spread hand)
form factor (relationship between perimeter and area)
Solidity (relationship between blob area and bounding box area)
etc

So, once you realize what kind of information you need, you can go on to think about what image processing methods would work to obtain such information. In your scenario you already have segmented the hand, which is very good. Once you use findCountorus() on the segmented image you can obtain a lot of properties mentioned, and with the convexity defects you can already obtain the number of fingers and other information So, the trick is to add as much relevant information as possible and use it to decide which gesture the hand is making.

After making these considerations you can start to think if ML is a good way to go. For this problem I would think about k Nearest Neighbours, which is a very simple ML algorithm that has an OpenCV implementation, it is light so it can work in embedded systems, works well when number of features is in the realm of the dozens, and for simpler problems will largely outperform more complicated ML-engines.

This answer is already too long so I wont go into a kNN explanation. I can do it in another answer if you are interested.

As I can tell there are two different problems here.

The first is about hand segmentation, and for us to help with this you should provide more information about the algorithm's working environment (luminosity, noise, background, camera position in relation to hand, tracked hand size in relation to image, etc). I'll go on assuming that skin-colour segmentation is working well.

Then there is gesture classification.

SURF provides useful information about edge orientation and other things, but there is really not much point in using it when you are already able to segment the hand from the rest of the image.

When deciding a ML approach, you should start to consider the simplest and fastest methods first, even more so when you are working on an embedded system. SVM works good when data to be classified can be separated through a linear plane, unless we talk about non-linear SVM in which the data may be separated by curved planes. SVM is not a light ML algorithm, and there is no indication that it would carry any advantage in solving this problem, but the only way to be sure is to try.

When you have to solve a computer vision problem, the first thing you should do is try to deeply think about it and describe it, so, lets to that:

What is a gesture? What may differentiate two different gestures in a hand? Here are the first things that pop to my mind.

Number of fingers showing
Holes in hand (like when some one touches the thumb with index finger)
How much convexity defects
Angle between hand and arm?

If the distance from hand to camera is always roughly the same, or you can compute that distance you can even throw in features like:

area (fist has smaller area than wide-spread hand)
form factor (relationship between perimeter and area)
Solidity (relationship between blob area and bounding box area)

~~etc~~

If the distance from hand to camera is always roughly the same, or you can compute that distance you can even throw in simpler features like plain area and perimeter.

So, once you realize what kind of information you need, you can go on to think about what image processing methods would work to obtain such information. In your scenario you already have segmented the hand, which is very good. Once you use findCountorus() on the segmented image you can obtain a lot of properties mentioned, and with the convexity defects you can already obtain the number of fingers and other information So, the trick is to add as much relevant information as possible and use it to decide which gesture the hand is making.

After making these considerations you can start to think if ML is a good way to go. For this problem I would think about k Nearest Neighbours, which is a very simple ML algorithm that has an OpenCV implementation, it is light so it can work in embedded systems, works well when number of features is in the realm of the dozens, and for simpler problems will largely outperform more complicated ML-engines.

This answer is already too long so I wont go into a kNN explanation. I can do it in another answer if you are interested.

As I can tell there are two different problems here.

The first is about hand segmentation, and for us to help with this you should provide more information about the algorithm's working environment (luminosity, noise, background, camera position in relation to hand, tracked hand size in relation to image, etc). I'll go on assuming that skin-colour segmentation is working well.

Then there is gesture classification.

SURF provides useful information about edge orientation and other things, but there is really not much point in using it when you are already able to segment the hand from the rest of the image.

When deciding a ML approach, you should start to consider the simplest and fastest methods first, even more so when you are working on an embedded system. SVM works good when data to be classified can be separated through a linear plane, unless we talk about non-linear SVM in which the data may be separated by curved planes. SVM is not a light ML algorithm, and there is no indication that it would carry any advantage in solving this problem, but the only way to be sure is to try.

When you have to solve a computer vision problem, the first thing you should do is try to deeply think about it and describe it, so, lets to that:

What is a gesture? What may differentiate two different gestures in a hand? Here are the first things that pop to my mind.

Number of fingers showing
Holes in hand (like when some one touches the thumb with index finger)
How much convexity defects
Angle between hand and arm?
form factor (relationship between perimeter and area)
Solidity (relationship between blob area and bounding box area)

If the distance from hand to camera is always roughly the same, or you can compute that distance you can even throw in simpler features like plain area and perimeter.

So, once you realize what kind of information you need, you can go on to think about what image processing methods would work to obtain such information. In your scenario you already have segmented the hand, which is very good. Once you use findCountorus() on the segmented image you can obtain a lot of the properties ~~mentioned,~~ mentioned above, and with the convexity defects you can already obtain the number of fingers and other information So, the trick is to add as much relevant information as possible and use it to decide which gesture the hand is making.

After making these considerations you can start to think if ML is a good way to go. For this problem I would think about k Nearest Neighbours, which is a very simple ML algorithm that has an OpenCV implementation, it is light so it can work in embedded systems, works well when number of features is in the realm of the dozens, and for simpler problems will largely outperform more complicated ML-engines.

This answer is already too long so I wont go into a kNN explanation. I can do it in another answer if you are interested.

Revision history [back]