Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Depending on the capture conditions and the buses which you want recognize. Some cases would be really difficult.

The simplest case would be a camera with fixed position pointing to a street, where said bus come facing the camera.

Preparing a recognition database:

  1. Obtain reference images from the buses with similar pose to the ones that you expect to see by such camera
  2. Train a classifier with its keypoints and descriptors.
  3. Separate the planar sections of each reference image (manually), and its respective sets of
    keypoints/descriptors
  4. (optional) Keep the real-world dimensions of such planar sections

A possible algorithm:

  1. Remove the background (street, trees)
  2. Segment non background objects(possible issues with occlusion)
  3. Detect keypoints and descriptors for each segmented object
  4. Use an classifier (SVM,Vocabulary Tree, etc) in order to verify if it is a bus
  5. Match the candidate object features with the features of the bus type determined by the classifier
  6. For each planar section, try to compute an homography with the matched features.
  7. If one or more homographies are consistent, you have detected the bus.

If you wish determine its distance (optional):

  1. Refine the matched features with some template matching approach
  2. If the refinement succeeds, use PnP with the matched points and their real-world dimensions, and determine its distance. (you will need a calibrated camera)

Of course, this will need a bit of tweaking, determine the best keypoint detector/feature descriptor, etc.

Note that if you have a mobile camera (smartphone, tablet, go pro, etc...), it is much more difficult, you will have to train the classifier with several poses from the buses. Things like background removal and determination of distances would be more difficult, if not impossible.