Ask Your Question
1

Need help understanding how the HOGDescriptor class works in Java 3.1.0

asked 2016-07-21 12:54:46 -0600

tdurham721 gravatar image

I am trying to extract HOG features from an image using OpenCV in Java. I have previously done this in MATLAB using its built in HOG functions, but I am having trouble doing it with OpenCV. In the constructor for HOGDescriptor, there are five parameters. My understanding is that only 16x16 is supported for blockSize, only 8x8 is supported for blockStride and cellSize, and only 9 is supported for nBins. There is no suggestion about what to set the window size to, although the program will throw an exception if both of the values for it are not divisible by 8. A lot of code I found online will set winSize to 64x128, but they don't provide any explanation as to why. What is this parameter for, and what does it do for extracting HOG features from an image? The test image I am using right now is a 196x240 grayscale image.

I am also unsure what the differences between the compute and computeGradient methods are. I honestly cannot find much information about it in the documentation, and I can't figure out what the difference between the two are. If you could help me understand these things, or point me to a good place to read about them, I would appreciate it a lot!

Thanks

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted
2

answered 2016-07-21 14:11:31 -0600

berak gravatar image

updated 2016-07-21 14:27:59 -0600

imho, you nailed it - windowSize is the only "variable" here, anything else is fixed.

the windowSize is the size of the images, the descriptor was trained upon, e.g. 64x128 for the inria person detector, 48x96 for the daimler person one, and if you would train your own "head or tails" coin detector, you'd probably choose 32x32 there. it's the minimum size, detectMultiScale() can detect an object later.

(also, we're probably safe, to drop "java" from this question entirely.)

the HOGDecriptor class serves 2 purposes, on the one hand you can load a pretrained SVM and detectMultiScale() things it was trained upon(e.g. persons) in arbitrary images,

on the other hand (think about the coin example before), you can use it's compute() function to extract "feature" vectors for further machine learning later (you could train your own detector on positive/negative XxY windows)

edit flag offensive delete link more

Comments

Ah ok that makes sense. I could not tell what it was based only off the name windowSize, but that is easy to understand. Do you know anything about the differences in the methods compute and computeGradient? The documentation only lists what parameters to give the methods; it doesn't say anything about what they actually calculate.

tdurham721 gravatar imagetdurham721 ( 2016-07-21 14:28:34 -0600 )edit

"Do you know anything about the differences in the methods compute and computeGradient?"

unfortunately, not really. compute() returns a complete HOG descriptor (histograms from gradients taken on N patches inside the window), and i can only speculate, that computeGradient() is some more low-level thing, that only retrieves parts of it.

berak gravatar imageberak ( 2016-07-22 01:01:09 -0600 )edit

Ok thank you for your help. I will stick to using compute for now.

tdurham721 gravatar imagetdurham721 ( 2016-07-22 08:27:30 -0600 )edit

Question Tools

1 follower

Stats

Asked: 2016-07-21 12:54:00 -0600

Seen: 833 times

Last updated: Jul 21 '16