Ask Your Question

key-point based detection vs HoG/Haar training

asked 2014-03-14 05:00:41 -0600

sy456 gravatar image


I need to make a decision between key-point based detection vs HoG/Haar training. Sorry for long question in advance but I am really stuck!!

So far, I have been trying to use SIFT, SURF and other key-point based feature extraction methods to detect and track vehicles, pedestrians, traffic signs and lanes. I have to detect these objects at the same time with moving camera. I used that approach for two consecutive frames to analyse the movement of key-points:

detect features --> describe features--> match features between frames --> filter matches

After that I want to group the features onto onto the cars, pedestrians, traffic signs and lanes. I think there should be same way to achieve this. I need to make a data reduction inside the camera because HD cameras produces large data streams. I thought that using these approach I can create a cheap vision pipeline without using any trained data.

However, when I read research paers and talk to experts, i see that if you want to draw a bounding box on the object, you mostly need a trained data. Most of the people uses HoG/Haar training and feed a classifier (SVM/Cascade) for specific objects. Why HoG and Haar is mostly preferred by the community rather than using SIFT or SURF? I cannot convince myself to switch to HoG of Haar!

Also, people use different detectors specific for the object. For instance; HoG + SVM for pedestrian detection; Haar+Cascade for vehicle detection; edge detection+ hough transform+ line fitting for lanes etc.. What I want to do is to find the commonalities and variabilities of these different pipelines and (if possible) come up with a pipeline as generic as possible.

Any advice or pointer to resources??


edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2014-03-14 15:58:15 -0600

yes123 gravatar image

updated 2014-03-14 19:56:26 -0600

SURF, SIFT etc got a big issue: texture-less objects can't be detected with them because you will not find any keypoints on them. Pedestrian most of the time are best described by their shape and, depending on the camera quality/speed, you will find very low quality keypoints on them.

But if you need to detect textured objects (especially if you have good quality video sequences), keypoint-based techniques are generally better. There are many researches on topic (even with online learning keypoint-based) you can watch a tracking algorithm here for example: Matrioska: Tracking By detection using Keypoints

edit flag offensive delete link more


thank you for the answer and the link. what do you mean by texture-less objects?

sy456 gravatar imagesy456 ( 2014-03-16 14:37:18 -0600 )edit

Textureless object is an object without strong edges inside. Think of a white paper over a desk. Google for more information:

yes123 gravatar imageyes123 ( 2014-03-16 15:19:11 -0600 )edit

Question Tools



Asked: 2014-03-14 05:00:41 -0600

Seen: 2,290 times

Last updated: Mar 14 '14