Ask Your Question

Revision history [back]

Its really hard to understand your parameters.. for example, your sliding window size taking up the whole image height everytime is very unusual, so I'll assume you know what you are doing and address your points.

I want to ask about why different translation/roi image size produces different scores and predictions?

Your detection window (ROI) must be consistent with the way you trained your SVM. So, if you trained your algorithm with a ROI size x and then try to make it run with ROI size y, the results may become unconsistent and hard to understand. Changing the translation rules of this window will also change the results. Because different translations will make the detector evaluate different ROI's, so the results are necessarily different.

And are there any way to guarantee what kind of displacements are best to achieve good detection

Only trial and error can provide an answer to this. But the work as been done, there is no need to reinvent the wheel. Most sliding window detectors work with a size of (64,128) and use a 4-pixel stride in both directions.

Are there any down resolution factors other than HOG blocks calculation? like stride in convolution and score calculation step, is it dense overlapped with 1 pixel stride?

I suggest you read some scientific literature. It has been a while since HOG has been the best way to find people in images. Even though people detection in urban scenarios remains as one of the unsolved problems in computer vision, there have been significant advancements and there are algorithms that perform somewhat accurately and fast. I suggest you review the state-of-the-art. You can start by this paper How Far are We from Solving Pedestrian Detection? Keep in mind there still isn't a reliable detector that works solely on image. Most reliable detectors use data from different sensors (thermal, LIDAR, RADAR) to achieve good results.

For speeding up the detection, I changed LAMBDA to 5 instead of 10. Other than misdetection from different pyramid scale size, are there any miscalculation or dependent parameters in LSVM code to be ware of? or is it safe to change LAMBDA only limiting pyramid scale size?

Not really. The less downscales you evaluate the less pedestrians you detect.You just need to do a balance between speed and accuracy needed for your application.