Revision history [back]

SLAM is simultaneous localization and mapping, it has nothing to do with finger/hand tracking. Most SLAM implementations assume a static scene and humans are anything but static. Your best bet is to get 3D data either via stereo cameras or a 'depth camera' like the Kinect or an Intel offering (both are stereo with structured light, though some versions are time-of-flight). Then with both 2D and 3D information, train a detector/segmentor and then a pose estimator. The pose estimator will require synthetic data as ground truth cannot be obtained any other way (except maybe form an existing system). This is no small feat, so as a first approach, I recommend skin-color segmentation and convexity defects analysis as much can be found on this approach.