Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

What is the HOG descriptor's shape?

I'm dipping my toe in pedestrian detection with OpenCV using Histogram of Oriented Gradients. I'd like to understand the descriptor better (I banged out a quick visualizer in pyplot), but I'm having trouble figuring out the output data structure. It's a very long 1D array... great for machine learning, not so easy for a human to understand.

Here is my configuration, in OpenCV in Python. "img" is 64x128 and greyscale.

winSize = (64,128)
blockSize = (16,16)
blockStride = (8,8)
cellSize = (8,8)
nbins = 9
derivAperture = 1
winSigma = 4.
histogramNormType = 0
L2HysThreshold = 2.0000000000000001e-01
gammaCorrection = True

hog = cv2.HOGDescriptor(winSize,blockSize,blockStride,cellSize,nbins,derivAperture,winSigma,
                        histogramNormType,L2HysThreshold,gammaCorrection)

winStride = (8,8)
padding = (8,8)
locations = ((10,20),)
hist = hog.compute(img, winStride, padding, locations)

And I get a vector of len 3780 - 7x15 blocks (not 8x16 because of the overlap), 2x2 cells per block, 9 angle bins. Is the shape (7, 15, 2, 2, 9)? Or (2, 2, 7, 15, 9)? Or (14, 30, 9)? Do the angle bins go from 0 to 180 or 180 to 0? Or is a 360 HOG? Does width come first or height?

What is the OpenCV convention?