Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

With a single shot and no other information the answer is easy: you can't, because info about depth is being lost during the process of image formation.

You can recover 3D world coordinate using stereoscopic vision: take 2 shots moving the camera with a pure translation (in general you can even rotate the camera but this will give no advantages) and calculate depth of a common point in the 2 images using disparity and the camera focal lenght (which you know).

Retrieving depth from a single shot is still a subject of research and can be done, but you need to have some additional info in your image like a known size object placed at the same distance of the object you're trying to identify.

With a single shot and no other information the answer is easy: you can't, because info about depth is being lost during the process of image formation.

You can recover 3D world coordinate using stereoscopic vision: take 2 shots moving the camera with a pure translation (in general you can even rotate the camera but this will give no advantages) and calculate depth of a common point in the 2 images using disparity and disparity, the camera focal lenght (which you know). know) and the baseline (which is the translation of the camera between the 2 positions).

Retrieving depth from a single shot is still a subject of research and can be done, but you need to have some additional info in your image like a known size object placed at the same distance of the object you're trying to identify.

With a single shot and no other information the answer is easy: you can't, because info about depth is being lost during the process of image formation.

You can recover 3D world coordinate using stereoscopic vision: take 2 shots moving the camera with a pure translation (in general you can even rotate the camera but this will give no advantages) and calculate depth of a common point in the 2 images using disparity, disparity (d), the camera focal lenght (which (F, which you know) know from calibration) and the baseline (which (B, which is the translation of the camera between the 2 positions). positions):

Z = fB/d

Retrieving depth from a single shot is still a subject of research and can be done, but you need to have some additional info in your image like a known size object placed at the same distance of the object you're trying to identify.