Dear all,

Since I was not able to find an answer to this question in this forum, I decided to sign up and post it. Out of curiosity, I decided to do a small OpenCV project. The aim is to measure an objects height via a single camera. The camera may safely be assumed to be fixed, equally, the object's distance to the camera is known. Therefore, this equation should be solveable.

## Algorithm

Currently my algorithm is as follows: (1) Calibrate Camera (2) Manually choose a point on the video stream which is on the ground (3) Detect object and upper object boundaries (y-coordinates) on the video stream (4) Calculate height: Difference between reference point (3) and upper object boundaries (4) is object's height.

## Problem

This seems to work - however there is an error of 3 - 10 centimeters. The error seems to depend (a) on the quality of the calibration, (b) on the location of the object along the videos' x-Achsis (i.e. camera does not seem to be parallel to the ground) and (c) y position on the screen (the higher the object, the larger the error).

As a result, I guess that I am doing something entirely wrong. To be more concrete, I will lay out the steps (1) to (4) in greater detail.

## (1) Camera Calibration

Is done via chessboard patterns which each have 26 mm of size. Basically I use an adaption of the Emgu CV (C# Bindings) examples and this link: http://dasl.mem.drexel.edu/~noahKuntz/openCVTut10.html

## (2) Manually choose a point that is on the ground

For reasons of convenience I simply click on the x,y-coordinate of the video stream, where the (image of the) ground intersects with the (image of the wall) within my room. Simple enough...

## (3) Detect upper object boundaries

Simple feature detection which works well (proven by drawing circles around them).

## (4) Calculate height

Here it gets a bit tricky - though my approach is fairly simple. According to http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html (more specifically this formular http://docs.opencv.org/_images/math/69a88b04c61001bf4e198abae39569e8bc3e81c2.png) one should be able to compute real world Y-coordinates by calculating y = (v-c_y) * z/f_y. Using this formular I calculate y_upperBoundary and y_ground in real world coordinates (with respect to the cameras absolute position in real world coordinates, I assume). I provide the following inputs:

- v = y-coorinate on the video stream for the upper boundary of the object or the ground, respectively. Measured in pixels.
- f_y = intrinsic camera parameters at row 1, column 2 (starting to count from 0). Measured in pixels.
- c_y = intrinsic camera parameters at row 1, column 2 (starting to count from 0). Measured in pixels.
- z = real world Z-distance from the camera to the object, measured in mm (the calibration has also been set up in mm). I then calculate (y_ground - y_upperBoundary) / 1000 to get the objects height in mm.

I assume the last step (calculating y_ground minus y_upperBoundary) is necessary, because otherwise I'd measure the objects y position relative to the intersection Z-axis of the cameras coordinate system with the object. But I need the height from the ground.

## Questions

Why the previously described problems occur is beyond my knowledge. Reading "Learning OpenCV" has confused me even more. More specifically:

- What explains the problems described above (what's wrong with the algorithm)?
- Why does the pixel position on the image plane seem to matter (the higher, the larger the error)?
- Don't I need to take the distortion coofficients and the extrinsic camera parameters into account?
- How can I account for the fact that the image of the camera does not seem to be parallel to the floor?

I also noticed that my formular refers to lower case y while the OpenCV documentation expects real world coordinates to be upper case Y. Does my mistake lie there?

Any help is greatly appreciated. Thanks in advance, Case1