Revision history [back]

You need at least two cameras (more is better), or a depth camera such as a Kinect.

I'm actually in the process of putting together a module that does just this, based on THIS PAPER. It will take as inputs the camera rvec and tvec, as well as the camera matrix and distortion parameters.

However you do it, you'll need to know where the camera(s) are in relation to your coordinate system. Also, how to match points between cameras, which is not an easy problem.

Using a depth camera is probably easier, but I've never really used one, so I can't help.