I'm dipping my toe in pedestrian detection with OpenCV using Histogram of Oriented Gradients. I'd like to understand the descriptor better (I banged out a quick visualizer in pyplot), but I'm having trouble figuring out the output data structure. It's a very long 1D array... great for machine learning, not so easy for a human to understand.
Here is my configuration, in OpenCV in Python. "img" is 64x128 and greyscale.
winSize = (64,128)
blockSize = (16,16)
blockStride = (8,8)
cellSize = (8,8)
nbins = 9
derivAperture = 1
winSigma = 4.
histogramNormType = 0
L2HysThreshold = 2.0000000000000001e-01
gammaCorrection = True
hog = cv2.HOGDescriptor(winSize,blockSize,blockStride,cellSize,nbins,derivAperture,winSigma,
histogramNormType,L2HysThreshold,gammaCorrection)
winStride = (8,8)
padding = (8,8)
locations = ((10,20),)
hist = hog.compute(img, winStride, padding, locations)
And I get a vector of len 3780 - 7x15 blocks (not 8x16 because of the overlap), 2x2 cells per block, 9 angle bins. Is the shape (7, 15, 2, 2, 9)? Or (2, 2, 7, 15, 9)? Or (14, 30, 9)? Do the angle bins go from 0 to 180 or 180 to 0? Or is a 360 HOG? Does width come first or height?
What is the OpenCV convention?