Identifying and tracking hands in a scene
I am trying to write a HandTracker application to identify and track human hands in a scene. I have tried a number of techniques, which I will detail here. I need your help to figure out the technique or combination of techniques I should use to be successful.
What I have:
- 640*480 Bayer-encoded color stream
- 640*480 depth stream wherein each pixel is in the range 0 - 10000 and is registered (aligned) to the corresponding pixel in the color stream.
What I have tried:
- Cascade classifier using a Haar dataset. I have tried the hand datasets here, here, and here with almost no success. Some datasets produce many false positives and some produce no positives no matter how much I wave my hand about.
- Descriptor matching with a SURF detector, BRISK extractor, and BFMatcher using knnMatch(). I use a screenshot cropped to contain only my open hand as the first image. I then run the matching scheme on each color frame, keeping only matches that pass a ratio test and a symmetry test. Finally, I compute a bounding polygon by using findHomography() on the matching keypoints and doing a perspectiveTransform(). This is somewhat successful, but is slow and dependent on the scale and orientation of the first image. And even when I get enough keypoints, the bounding polygon is often wildly incorrect.
- Contour and convex hull search. This finds way too many contours and hulls, and I don't know how to discriminate hands from everything else.
- Background removal with BackgroundSubtractorMOG2. This helps the accuracy of the contour search, as it makes sure all contours are part of the moving foreground. But it still doesn't help me discriminate between a hand and a neck (for example), and finding accurate bounding geometry still a problem.
- CamShift. This looks like it could be useful, but I haven't gotten it working. All the examples I've found seem really complicated.
After working on this for a week, I'm starting to get more than a little frustrated. I just want to do something simple like the hand tracking in OpenNI. Unfortunately, the OpenNI 1.5 code is so obfuscated that I can't even find the hand tracking portion.
Does anyone have any ideas for how I can combine and/or refine these techniques on the color stream or incorporate the depth stream to achieve robust hand identification/tracking?
Do you have the solution?