Revision history [back]

For KinectV1 you can use libfreenect or OpenNI2 (more complete, but not sure if it has Python bindings).

Then set the registration mode to IMAGE_TO_DEPTH_REGISTRATION_IMAGE in OpenNI or FREENECT_DEPTH_REGISTERED in libfreenect to superpose the color and depth data.

Create a stream for color and for depth images and capture the two streams continuously. Refer to the example codes how to do this.

For each image, get a mask on the desired colored objects just as you did. If you have several objects of same colour, separate them with connectedComponentAnalysis.

Then compute the mean depth on the masked area (i.e. the area covered by a colored box) use the mean function: dmean = cv2.mean(image, mask). Get the center of gravity for the object as the middle of the bounding box (cx=x+w/2).

Now you have all the elements to get the (x,y,z) coordinates of the object: use the freenect_camera_to_world() or convertDepthToWorld() function.