Outsider seeking advice on cuboid detection & robot localization

I am on two inexperienced college robotics teams that need to use computer vision to solve similar types of problems. I am focusing on using a video stream for localization ("where is the robot relative to this object?").

The first (and seemingly simplest) task I am trying to accomplish is to, given an image which contains a single block (cuboid, aka rectangular prism, of known color and dimension) lying on an even floor, determine the block's distance from the robot and its orientation. The camera's height, pitch, FOV, etc are all presumed to be known and constant.

I am self taught and thus lack the benefits of knowing how I should approach common problems. Thankfully I have a strong math background and can understand the computer vision theory which I have read so far. All the same, I would like some insight into how knowledgeable people would go about solving this problem. If there is a preferred text on computer vision, I would appreciate a link to it as well.

What follows is simply a representative summary of my attempts and undirected tinkering so far. Any questions mentioned in passing are not the primary purpose of this post:

I know enough to be able to generate a binary image of blobs that are sufficiently close to the target HSV color. And of course, I've experimented with blurring the image to varying degrees before doing any of this.

The block is slightly glossy, and thus the binary image has a corresponding hole. I know I can fix this with the closure operator, but that seems to turn the entire image a bit blocky. Also, vertical (never horizontal) stripes of white appear between sufficiently close blobs. What sort of kernel should I pass to it MorphologyEx to prevent this?

I tried using contours to find the boundary of an object, but found that it uses far more points than the 6 that a human would use. It also seems to be noisy. I've yet to try a convex hull approach because docs.opencv.org seems to be down at the moment.

GoodFeaturesToTrack frequently has false positives/negatives in detecting the six/seven visible corners of the block, even under seemingly ideal conditions. As an alternative, I suppose I could run edge detection, then Hough lines, then pair lines together based off of similarity of angle, and look for the outline of my block in triplets of pairs of lines... but I have the feeling that this is not the proper way to approach the problem. Hence why I am asking for insight.

PS: I started using OCV 2.4 about 5 days ago. I start using Python at the encouragement of my team leaders. Should I bite the bullet and learn to use OCV in C++ instead of Python? I understand the OCV C++ code that I have seen. I am one of/the most capable programmer on either team and have never used SWIG or Boost.Python ...

edit retag close merge delete

Those are a lot of questions, maybe you should break it down into several smaller direct questions. It would also be nice to post pictures of the intermediate steps you describe with the explanation of what is wrong with them and what you want to achieve. About docs.opencv.org being down, it is very likely that your network is blocking it. It happens to me at work because somehow the website IP is in a blacklist.

( 2012-09-07 15:11:00 -0600 )edit

Unfortunately it'll be a few days before I have the time to post pictures, but I will.

I don't think that docs.opencv.org was blocked: it worked the day before, and my employer's network admin is exceedingly generous.

I only really care about the two questions in bold. That is why I segmented off the summary of my attempts so far and said that any quests asked in there were not the main thrust of this post. I assume that the lesser questions are things that will become apparent to me in time.

To reiterate: how would a knowledgeable person go about finding the orientation & distance of a distinctive colored block, given that most everything else is constant? Should I instead be trying to localize with fastSLAM, or some such? Also, am I better served by writing my code in C++?

( 2012-09-08 07:48:36 -0600 )edit

Sort by ยป oldest newest most voted

If you know the realworld dimensions of the block and you have detected its corners you can get a ratio between the number of pixels and realworld dimensions.

One of the most popular books about this topic is "Multiple View Geometry in Computer Vision". You have a maths background, you will love it.

I think SLAM-based approaches are currently the most advanced techniques to estimate a camera position in the real-world, but maybe you should start by simpler techniques.

About Python or C++, I am not experienced with the python version, but your decision can be based on these topics:

• Python is considered easier to learn and program than C++.
• In theory C++ runs faster than python.
• You will find that the official and non-official opencv documentation is more oriented to C++, but you will also find help for python.
• Opencv is more complete in C++, but the differences are not that meaningful in my opinion.
• If you need to interact if other C++ code, your life will be easier if your code is C++.
more