How to use SVM(Support Vector Machine) to detect Hand Gesture in real time using Python and OpenCV?
I want to detect hand gestures in real time using some dataset of about 100 images (10 images for 10 gestures).
What i have done till now :-
- I have created a dataset of 100 images.
- I have code to detect hand.
- I'm using OpenCV 3.2.0 and Python 2.7 on Linux.
What i want to do next :-
- I want to load my folder(containing images) into SVM.
- I want to give SIFT descs as input into SVM.
- I want to predict real time hand gesture.
I have written a code to understand SVM and tested it successfully :-
import cv2
import numpy as np
from sklearn.svm import SVC
img = cv2.imread("1_1.jpg")
img = cv2.resize(img, (600,400))
sift = cv2.xfeatures2d.SIFT_create()
(kps1_1, descs1_1) = sift.detectAndCompute(img, None)
a1=np.mean(descs1_1)
img = cv2.imread("1_2.jpg")
img = cv2.resize(img, (600,400))
sift = cv2.xfeatures2d.SIFT_create()
(kps1_2, descs1_2) = sift.detectAndCompute(img, None)
a2=np.mean(descs1_2)
img = cv2.imread("1_3.jpg")
img = cv2.resize(img, (600,400))
sift = cv2.xfeatures2d.SIFT_create()
(kps1_3, descs1_3) = sift.detectAndCompute(img, None)
a3=np.mean(descs1_3)
img = cv2.imread("1_4.jpg")
img = cv2.resize(img, (600,400))
sift = cv2.xfeatures2d.SIFT_create()
(kps1_4, descs1_4) = sift.detectAndCompute(img, None)
a4=np.mean(descs1_4)
img = cv2.imread("1_5.jpg")
img = cv2.resize(img, (600,400))
sift = cv2.xfeatures2d.SIFT_create()
(kps1_5, descs1_5) = sift.detectAndCompute(img, None)
a5=np.mean(descs1_5)
img = cv2.imread("3_1.jpg")
img = cv2.resize(img, (600,400))
sift = cv2.xfeatures2d.SIFT_create()
(kps3_1, descs3_1) = sift.detectAndCompute(img, None)
b1=np.mean(descs3_1)
img = cv2.imread("3_2.jpg")
img = cv2.resize(img, (600,400))
sift = cv2.xfeatures2d.SIFT_create()
(kps3_2, descs3_2) = sift.detectAndCompute(img, None)
b2=np.mean(descs3_2)
img = cv2.imread("3_3.jpg")
img = cv2.resize(img, (600,400))
sift = cv2.xfeatures2d.SIFT_create()
(kps3_3, descs3_3) = sift.detectAndCompute(img, None)
b3=np.mean(descs3_3)
img = cv2.imread("3_4.jpg")
img = cv2.resize(img, (600,400))
sift = cv2.xfeatures2d.SIFT_create()
(kps3_4, descs3_4) = sift.detectAndCompute(img, None)
b4=np.mean(descs3_4)
img = cv2.imread("3_5.jpg")
img = cv2.resize(img, (600,400))
sift = cv2.xfeatures2d.SIFT_create()
(kps3_5, descs3_5) = sift.detectAndCompute(img, None)
b5=np.mean(descs3_5)
X = np.array([[a1], [a2], [a3], [a4], [a5], [b1], [b2], [b3] , [b4], [b5]])
y = np.array([1, 1,1, 1,1, 2, 2, 2, 2, 2])
img = cv2.imread("3_16.jpg")
img = cv2.resize(img, (600,400))
sift = cv2.xfeatures2d.SIFT_create()
(kps3_16, descs3_16) = sift.detectAndCompute(img, None)
v=np.mean(descs3_16)
clf = SVC()
clf.fit(X, y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
print(clf.predict(v))
It gives the following output :-
[2]
Now following are my problems :-
- I do not want to manually load hundred images and label them.
- How can I compare my real time frame with my dataset?
- Is this the correct way to give features as input into SVM ...
then what should i use??
$ shape reco , maybe.
can you tell me the way to load multiple images into svm with label?
Oh and btw, getting a robust gesture classification, based on 10 images per gesture, you can simply forget it. State of the art systems are learned over multiple thousands and millions of images.
i just want to check whether it is working or not, i'm not concerned about robustness.
But it won't work :D You will get misclassification all the time and thus will give you the wrong impression of the algorithm.
then how much images per gesture should i take?
250-1000 to start with I guess
thnxx..................