Revision history [back]

Object height detection (single camera)

Dear all,

Since I was not able to find an answer to this question in this forum, I decided to sign up and post it. Out of curiosity, I decided to do a small OpenCV project. The aim is to measure an objects height via a single camera. The camera may safely be assumed to be fixed, equally, the object's distance to the camera is known. Therefore, this equation should be solveable.

Algorithm

Currently my algorithm is as follows: (1) Calibrate Camera (2) Manually choose a point on the video stream which is on the ground (3) Detect object and upper object boundaries (y-coordinates) on the video stream (4) Calculate height: Difference between reference point (3) and upper object boundaries (4) is object's height.

Problem

This seems to work - however there is an error of 3 - 10 centimeters. The error seems to depend (a) on the quality of the calibration, (b) on the location of the object along the videos' x-Achsis (i.e. camera does not seem to be parallel to the ground) and (c) y position on the screen (the higher the object, the larger the error).

As a result, I guess that I am doing something entirely wrong. To be more concrete, I will lay out the steps (1) to (4) in greater detail.

(1) Camera Calibration

Is done via chessboard patterns which each have 26 mm of size. Basically I use an adaption of the Emgu CV (C# Bindings) examples and this link: http://dasl.mem.drexel.edu/~noahKuntz/openCVTut10.html

(2) Manually choose a point that is on the ground

For reasons of convenience I simply click on the x,y-coordinate of the video stream, where the (image of the) ground intersects with the (image of the wall) within my room. Simple enough...

(3) Detect upper object boundaries

Simple feature detection which works well (proven by drawing circles around them).

(4) Calculate height

Here it gets a bit tricky - though my approach is fairly simple. According to http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html (more specifically this formular http://docs.opencv.org/_images/math/69a88b04c61001bf4e198abae39569e8bc3e81c2.png) one should be able to compute real world Y-coordinates by calculating y = (v-c_y) * z/f_y. Using this formular I calculate y_upperBoundary and y_ground in real world coordinates (with respect to the cameras absolute position in real world coordinates, I assume). I provide the following inputs:

v = y-coorinate on the video stream for the upper boundary of the object or the ground, respectively. Measured in pixels.
f_y = intrinsic camera parameters at row 1, column 2 (starting to count from 0). Measured in pixels.
c_y = intrinsic camera parameters at row 1, column 2 (starting to count from 0). Measured in pixels.
z = real world Z-distance from the camera to the object, measured in mm (the calibration has also been set up in mm). I then calculate (y_ground - y_upperBoundary) / 1000 to get the objects height in mm.

I assume the last step (calculating y_ground minus y_upperBoundary) is necessary, because otherwise I'd measure the objects y position relative to the intersection Z-axis of the cameras coordinate system with the object. But I need the height from the ground.

Questions

Why the previously described problems occur is beyond my knowledge. Reading "Learning OpenCV" has confused me even more. More specifically:

What explains the problems described above (what's wrong with the algorithm)?
Why does the pixel position on the image plane seem to matter (the higher, the larger the error)?
Don't I need to take the distortion coofficients and the extrinsic camera parameters into account?
How can I account for the fact that the image of the camera does not seem to be parallel to the floor?

I also noticed that my formular refers to lower case y while the OpenCV documentation expects real world coordinates to be upper case Y. Does my mistake lie there?

Any help is greatly appreciated. Thanks in advance, Case1