key-point based detection vs HoG/Haar training
Hi,
I need to make a decision between key-point based detection vs HoG/Haar training. Sorry for long question in advance but I am really stuck!!
So far, I have been trying to use SIFT, SURF and other key-point based feature extraction methods to detect and track vehicles, pedestrians, traffic signs and lanes. I have to detect these objects at the same time with moving camera. I used that approach for two consecutive frames to analyse the movement of key-points:
detect features --> describe features--> match features between frames --> filter matches
After that I want to group the features onto onto the cars, pedestrians, traffic signs and lanes. I think there should be same way to achieve this. I need to make a data reduction inside the camera because HD cameras produces large data streams. I thought that using these approach I can create a cheap vision pipeline without using any trained data.
However, when I read research paers and talk to experts, i see that if you want to draw a bounding box on the object, you mostly need a trained data. Most of the people uses HoG/Haar training and feed a classifier (SVM/Cascade) for specific objects. Why HoG and Haar is mostly preferred by the community rather than using SIFT or SURF? I cannot convince myself to switch to HoG of Haar!
Also, people use different detectors specific for the object. For instance; HoG + SVM for pedestrian detection; Haar+Cascade for vehicle detection; edge detection+ hough transform+ line fitting for lanes etc.. What I want to do is to find the commonalities and variabilities of these different pipelines and (if possible) come up with a pipeline as generic as possible.
Any advice or pointer to resources??
Regards