Revision history [back]

Outsider seeking advice on cuboid detection & robot localization

I am on two inexperienced college robotics teams that need to use computer vision to solve similar types of problems. I am focusing on using a video stream for localization ("where is the robot relative to this object?").

The first (and seemingly simplest) task I am trying to accomplish is to, given an image which contains a single block (cuboid, aka rectangular prism, of known color and dimension) lying on an even floor, determine the block's distance from the robot and its orientation. The camera's height, pitch, FOV, etc are all presumed to be known and constant.

I am self taught and thus lack the benefits of knowing how I should approach common problems. Thankfully I have a strong math background and can understand the computer vision theory which I have read so far. All the same, I would like some insight into how knowledgeable people would go about solving this problem. If there is a preferred text on computer vision, I would appreciate a link to it as well.

What follows is simply a representative summary of my attempts and undirected tinkering so far. Any questions mentioned in passing are not the primary purpose of this post:

I know enough to be able to generate a binary image of blobs that are sufficiently close to the target HSV color. And of course, I've experimented with blurring the image to varying degrees before doing any of this.

The block is slightly glossy, and thus the binary image has a corresponding hole. I know I can fix this with the closure operator, but that seems to turn the entire image a bit blocky. Also, vertical (never horizontal) stripes of white appear between sufficiently close blobs. What sort of kernel should I pass to it MorphologyEx to prevent this?

I tried using contours to find the boundary of an object, but found that it uses far more points than the 6 that a human would use. It also seems to be noisy. I've yet to try a convex hull approach because docs.opencv.org seems to be down at the moment.

GoodFeaturesToTrack frequently has false positives/negatives in detecting the six/seven visible corners of the block, even under seemingly ideal conditions. As an alternative, I suppose I could run edge detection, then Hough lines, then pair lines together based off of similarity of angle, and look for the outline of my block in triplets of pairs of lines... but I have the feeling that this is not the proper way to approach the problem. Hence why I am asking for insight.

PS: I started using OCV 2.4 about 5 days ago. I start using Python at the encouragement of my team leaders. Should I bite the bullet and learn to use OCV in C++ instead of Python? I understand the OCV C++ code that I have seen. I am one of/the most capable programmer on either team and have never used SWIG or Boost.Python, and will likely need to allow my object detection to interact with other C++ code.