Ask Your Question

hog detectmultiscale

asked 2013-07-16 09:15:26 -0600

lazarev gravatar image

updated 2013-07-16 09:22:26 -0600

Hello everyone,

I have a question about the hog.detectmultiscale method.

I'm trying to perform object detection using hog and svm. I understand that after computing the hog for an image size of 64x128 pixels, it returns a 3780-size descriptor. The training was done using svmlight and it returns a single 3780 + bias-size descriptor.

What I don't understand is when performing multiscale detection with a scale factor of 1.05 for example, how the comparison can be done when the detection window is bigger than the hog window.

I don't think that the descriptor computed by the detection window has the same number of elements that the descriptor computed by the hog window.

Thank you in advance.

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2013-07-16 09:29:56 -0600

Basically what HOG.detectMultiscale does is take your original image and create an image pyramid from it, using your resize factor. This means that the image gets downscaled each time until it reaches a size that is smaller than the model (which would be impossible to perform), and it also upscales the image until a level that you define.

This gives you the ability of detecting people at a single model scale, throughout different images scales, meaning that if a detection happens at a specific layer, the bounding box will be rescaled the same amount as the original image was to reach that pyramid layer.

Using this technique you can detect multiple people scales at only a single model scale, which is computationally less expensive than training a model for each possible scale and running those over the single image.

edit flag offensive delete link more


Thank you for your answer. I thought that detectmultiscale use a sliding window which compare every region in my origninal image with the model. So if I understand you, if the people I want to detect is a very small part in the image (like in aerial imagery), this method can't perform very well ? (The training was also done with an aerial imagery dataset)

lazarev gravatar imagelazarev ( 2013-07-16 09:41:46 -0600 )edit

Yes it can! You are mixing stuff up, first the pyramid gets rescaled, but each layer gets a window search approach, with model size to look for matches. Just be prepared to have heaps of training data, tenthousands at least, if you want good results with such variable objects as humans :)

StevenPuttemans gravatar imageStevenPuttemans ( 2013-07-16 10:33:14 -0600 )edit

Question Tools


Asked: 2013-07-16 09:15:26 -0600

Seen: 3,134 times

Last updated: Jul 16 '13