Revision history [back]

Depending on the capture conditions and the buses which you want recognize. Some cases would be really difficult.

The simplest case would be a camera with fixed position pointing to a street, where said bus come facing the camera.

Preparing a recognition database:

Obtain reference images from the buses with similar pose to the ones that you expect to see by such camera
Train a classifier with its keypoints and descriptors.
Separate the planar sections of each reference image (manually), and its respective sets of
keypoints/descriptors
(optional) Keep the real-world dimensions of such planar sections

A possible algorithm:

Remove the background (street, trees)
Segment non background objects(possible issues with occlusion)
Detect keypoints and descriptors for each segmented object
Use an classifier (SVM,Vocabulary Tree, etc) in order to verify if it is a bus
Match the candidate object features with the features of the bus type determined by the classifier
For each planar section, try to compute an homography with the matched features.
If one or more homographies are consistent, you have detected the bus.

If you wish determine its distance (optional):

Refine the matched features with some template matching approach
If the refinement succeeds, use PnP with the matched points and their real-world dimensions, and determine its distance. (you will need a calibrated camera)

Of course, this will need a bit of tweaking, determine the best keypoint detector/feature descriptor, etc.

Note that if you have a mobile camera (smartphone, tablet, go pro, etc...), it is much more difficult, you will have to train the classifier with several poses from the buses. Things like background removal and determination of distances would be more difficult, if not impossible.