Ask Your Question

Stereo vision and object detection.

asked 2014-01-15 10:30:06 -0600

SpiderGears gravatar image

updated 2014-01-17 02:17:00 -0600

Hey!! I am new to OpenCV. Want to implement stereo vision so i bought two webcams from iball this. These do not seem to have autofocus feature. Does this affect the performance and results.

a) What works better cams with autofocus or without..?

b) I would also also like to the same camera feed for object detection and recognition in view of upto 5m atleast. How does this affect the results.

c)Any other cam that you may suggest for the same.

All tasks to be performed in real time.

Once all set I want to move the system to an embedded board such as Beagle Bone or something similar. Once project repository is setup.. any contribution from the community is welcome and will be hugely appreciated.

edit retag flag offensive close merge delete



a: autofocus is a total pita, if you want stereo-calibration. consider yourself lucky !

berak gravatar imageberak ( 2014-01-16 03:21:53 -0600 )edit

thanks @berak, now i understand your point.

SpiderGears gravatar imageSpiderGears ( 2014-01-17 02:18:03 -0600 )edit

2 answers

Sort by ยป oldest newest most voted

answered 2014-01-16 03:12:13 -0600

jensenb gravatar image

updated 2014-01-16 08:11:33 -0600

In response to your questions:

  1. If you want to do any sort of geometric reasoning about your scene, then autofocus is absolutely taboo. Auto focus changes the intrinsic camera parameters and thus makes estimating high quality metric scene geometry near impossible, and definitely not real time. The typical workflow with stereo cameras is that they are manually focused and rigidly aligned specifically for the type of scene, calibrated and then unchanged.
  2. Autofocus may not have any effect on object recognition depending on your detector type. If it uses only 2d image information, then this will likely have little effect, but if the detector requires 3d information then autofocus will prevent reliable detection.
  3. There are special stereo cameras like the Bumblebee you can purchase.

Since it seems you are very unfamiliar with stereo vision processing, I recommend read the chapter on stereo vision in Szeliski. You also should more clear on your requirements, are you primarily interested in doing object detection, or 3d reconstruction using stereo cameras? Both of these are active research topics, and they are only partially related to each other, i.e. you can do object detection without using stereo reconstruction.

EDIT: I forgot to mention one important aspect when implementing stereo vision. Most (all that I am aware of) stereo reconstruction algorithms make the assumption that both of the images were taken at exactly the same time, that the scene did not change. If your scene is not static (people or objects are moving in it), this requires that you use synchronized cameras (like the Bumblebee). USB Webcams are rarely able to be reliably sychronized for a variety reasons.

edit flag offensive delete link more


I do not intend for for industry level precision at my work but should be reasonably reliable. Basically what i want to do is to have prototype system to help the blinds (sorry can't find words) navigate using opencv. The following are broad motives a) Recognize object in scene such as Staircase and other possible obstacle like a chair or something lying in front. b) Free path detection. c) Calculate distance to the recognized objects.

So are my odds of getting the things right..? and if needed how to synchronize the usb cameras. Since i intend to make it a wearable device, mobility is expected. PS: The specialised cameras you referred seems expensive to me. My basic is have a system that is affordable to all.

Any comments and guidance offered will be valuable.

SpiderGears gravatar imageSpiderGears ( 2014-01-17 02:12:02 -0600 )edit

Yes specialized Stereo Cameras are more expensive than building your stereo rig from two independent (cheaper) cameras, but are usually a better optimized design (synchronized image capture etc.). It sounds like you are just interested in getting 3D and color information from indoor scenes, you might want to check out a 3d scanner, like a Kinect or an Asus Xtion pro. They have the advantage in comparison to stereo that you have real time depth information directly from the device, whereas with stereo web cams you would have to calculate this yourself (using OpenCV or otherwise).

jensenb gravatar imagejensenb ( 2014-01-17 06:02:07 -0600 )edit

answered 2014-01-16 09:02:42 -0600

Nghia gravatar image

If you're going down the two webcam path and want to do simple parallel forward facing stereo, then I highly recommend you build a solid physical rig for it. The slightest pixel misalignment will mess up any 1D block matching algorithm.

Back in undergrad I was using two Logitech ball shaped webcams. I made two wooden pieces with holes for the camera to sit in, then used screws to clamp the wood together. Here's a poor ascii art:

 /--\       |       /--\
 \--/       |       \--/

I was using some rather thin MDF wood and over time it would deformed. In hindsight, I should have went with much thicker wood.

Calibrating the two cameras was tedious and done by eye balling. Getting the vertical alignment wasn't so bad because you can see the two images side by side and make a good guess. To make sure the cameras were close to parallel I would focus at a very far object and made sure they're at about the same (x,y) position.

edit flag offensive delete link more


Thanks the beautiful insight. I will am little more interested in your final results like the accuracy obtained and the performance of the system. Did you process the video feed on a Computer/Laptop or an embedded board such as a Beagle Bone or something.

If a project repository maintained that be really helpful.

SpiderGears gravatar imageSpiderGears ( 2014-01-17 02:14:49 -0600 )edit

This was back in 2004. It was a simple collision avoidance system for a mobile robot. I never calculated actual metric distance, just whether something was "close" or "far" based on experimental observation of disparity values. I was processing two 320x240 images, in real-time (forgot how fast actually) on a desktop 1GHZ AMD (single core). If you want a feel for stereo accuracy check out this site

Nghia gravatar imageNghia ( 2014-01-17 03:07:31 -0600 )edit

Stereo reconstruction is still a very active research field. The middlebury stereo benchmark is the standard ranking for stereo reconstruction algorithm performance: . That can give you a rough idea how well some stereo algorithms work.

jensenb gravatar imagejensenb ( 2014-01-17 06:06:21 -0600 )edit

Question Tools



Asked: 2014-01-15 10:30:06 -0600

Seen: 7,020 times

Last updated: Jan 17 '14