With a single shot and no other information the answer is easy: you can't, because info about depth is being lost during the process of image formation.
You can recover 3D world coordinate using stereoscopic vision: take 2 shots moving the camera with a pure translation (in general you can even rotate the camera but this will give no advantages) and calculate depth of a common point in the 2 images using disparity (d), the camera focal lenght (F, which you know from calibration) and the baseline (B, which is the translation of the camera between the 2 positions):
Z = fB/d
Retrieving depth from a single shot is still a subject of research and can be done, but you need to have some additional info in your image like a known size object placed at the same distance of the object you're trying to identify.