I watched a lecture that once said for meanshift (so pretty similar to camshift) only using the pixel intensity from a grayscale image was not sufficient. Previously I tried implementing meanshift with just grayscale and it didn't work (just ended up chasing the object around the image). Loads of camshift examples out there try to track a colour ball but then the colour bit is unique enough. For a standard grayscale image from a normal camera one thing to do would be to augment the pixel intesnisty with image gradient and orientation information (easily obtained with Sobel filters).
However, you mention a time-of-flight camera and in that situation I have no idea. Hopefully somebody with a bit more insight may be able to help.