Revision history [back]

First, you have to transform the image (depth) coordinates to a 3D point cloud (each point described by the x,y,z coordiantes).

If you can detect the floor plane (which is probably the lowest horizontal plane in the image at Y0) the camera is positioned at -Y0 height.

You can do it for example by iterating from 0 to Ymin and checking how many points you have at the given height.

If the floor is not visible, but you have the bounding box of the person, the floor level is probably at the minimal Y value of the bounding box. The same applies for a skeleton (except if the person is levitating).