Revision history [back]

training GPU HOGDescriptor for multi scale detection

Hi, I'm trying to train and use the gpu-accelerated HOG detector, and am using the current git version. Something's not working, and WAY to many positives are generated... and maybe someone can point out my mistake, or a tutorial for the current version (i.e., using getDescriptors rather than "compute"). Here's what I've tried so far:

1) Separated my training images, positive & negative, resized them to 32x32, converted to BGRA

2) Created gpu::HOGDescriptor using default parameters

HOGDescriptor(Size win_size=Size(32, 32), Size block_size=Size(16, 16),
              Size block_stride=Size(8, 8), Size cell_size=Size(8, 8),
              int nbins=9, double win_sigma=DEFAULT_WIN_SIGMA,
              double threshold_L2hys=0.2, bool gamma_correction=true,
              int nlevels=DEFAULT_NLEVELS);

3) Each image is sent to GPU, and descriptor matrix is calculated using getDescriptors

hog.getDescriptors(gpu_image, cv::Size(8,8), d_descriptors);

4) Since my window size is the same as my image size, I only get 1 row in my descriptor Mat, which I then write out to file.

5) These get stored in the svmlight / libsvm standard way: line entries beginning with +1/-1 followed by idx:val sequences for the entries in the descriptor Mat

6) I've trained using both svmlight & libsvm, using a linear kernel, and testing both as classification and regression, just to see if I'm screwing things up. The svm appears to be well-trained, in that I get seemingly ok loss from svm, and when I run on my original data, I get really nice separation of the classes. I convert the generated support vectors into a single vector (w[i] = sum_j SV_j[i] * a[j]), and then the I test the vector against the training input to make sure things pass a sanity check. Regression for the true positives is generally at/above +1, and values for true negatives are generally below -1 (great!)

7) I then take the svm vector w and the distance term from my svm, and push them all into a std::vector with the bias term last. The vector is now #sv + 1 in length (for me, 325). I use

hog.setSVMDetector(detector);

8) Now I put in images from my test sequence... upload img to GPU, call with reasonable parameters:

hog.detectMultiScale(gpu_img, found, hit_threshold=1.4, win_stride=(8,8), Size(0, 0), scale=1.05, gr_threshold=8);
for (size_t i = 0; i < found.size(); i++) {
  Rect r = found[i];
  rectangle(image, r.tl(), r.br(), CV_RGB(0, 255, 0), 3);
}

The problem now is that I get an unreasonable number of false positives.... the whole cv::Mat is painted with green boxes. I expect to get some number of false positives (which I'll then resolve through re-training), but it seems that every sub-window of correct size and stride has been tagged by detectMultiScale. It seems that I'm using something incorrectly, but I'm not sure where.

Biggest unknown for me is what opencv is expecting for format of svm in gpu::setSVMDetector.

I've also tried changing the descriptor row/col around during training (i.e., changing getDescriptor default parameter int descr_format=DESCR_FORMAT_COL_BY_COL to ROW_BY_ROW instead)

Any suggestions are appreciated, as well as pointers to tutorials that work with current version of gpu::HOGDescriptor.

Thanks!