Ask Your Question

HOG optimal training images

asked 2017-03-21 08:51:23 -0500

zelade gravatar image


im going to train a HOG descriptor on traffic signs. I wonder which pictures are best suited. How many pictures should I calculate for good results? What size should the images have and what influence does this have on the later detection? Is it good for the positive images to leave a narrow edge, so the background becomes visible? Is anyone familiar with it? I would like to estimate, before I take the photos.

Thanks in advance.

edit retag flag offensive close merge delete



please clarify, do you want to:

  • train a Hogdescriptor to detect a single class of traffic signs ? (like stop or speed-limit)
  • or do you want to build a classifier to distinguish between several signs ? (e.g. use HOG features with an SVM or ANN)
berak gravatar imageberak ( 2017-03-21 11:14:28 -0500 )edit

I wanto to distinguish between several signs. I'm going to learn SVM Light and compute the vector to DetectMultiScale. In my implementation i would use one detectMultiScale for each sign. But I wonder if it is still running in real time. Whats the difference beetween the two options? Or otherwise asked how is the second different from mine?

I just made a test with one street sign and it worked quiet well. I used images size 48 x 48 with a narrow edge. I used 180 pos and 4000 neg images. I have found that close signs are not recognized, is this due to the image size?

zelade gravatar imagezelade ( 2017-03-22 07:11:18 -0500 )edit

detectMultiScale can only be used to detect a single object class

if you use an SVM as a multi-class classifier, you will need some other tool for detection / segmentation, e.g. findContours()

berak gravatar imageberak ( 2017-03-22 07:23:11 -0500 )edit

Okay I'll look at this. But if i use it as multi-class classifier, the training process differs from my variant right?

So if I understand you correctly, the first step would be to find the contours of the signs in the frame. Then in the second step i calculate the hog features of the contours (rectangles) and then use predict() to classify the sign?

zelade gravatar imagezelade ( 2017-03-22 07:55:12 -0500 )edit

2 answers

Sort by ยป oldest newest most voted

answered 2017-03-22 08:04:27 -0500

berak gravatar image

updated 2017-03-22 08:19:08 -0500

ok, let's summarize the 2 approaches.

if you want to detect a single object class (detectMultiscale):

  • train:
  • crop all positive and negative images to same windowsize(e.g. 24x24). this is the minimum size, that can be detected later
  • use train_HOG.cpp (from the samples), to train an SVM(regression), and save it
  • detect:
  • load the single, pretrained SVM support vector into the HOGDescriptor
  • detectMultiscale on an (abitrary sized) grayscale image.

if you want to classify multiple traffic signs:

  • train:
  • crop all images to same windowsize(e.g. 24x24) (same as above)
  • get HOG descriptors for each, reshape them to a single row, and push_back all of them into a single large Mat. you also need a "labels" Mat, containing the class id for each descriptor
  • train an SVM (or ANN, or KNN) with this data and labels (classification)

  • test:

  • find contours in large image
  • get boundingRect() of it
  • get image ROI (crop) of that rect, resize to the HOG windowsize, you trained your SVM on
  • get HOG feature from that ROI
  • predict() with that HOG feature
edit flag offensive delete link more


Thanks for the detailed answer!

zelade gravatar imagezelade ( 2017-03-22 08:32:06 -0500 )edit

How do you label the negative images?

zelade gravatar imagezelade ( 2017-03-22 10:43:40 -0500 )edit

Hi, is there any function like "prediction" for GPU (CUDA c++)? If not, what are the alternatives? How can i implement "prediction" on GPU? Thank you!

olarvik gravatar imageolarvik ( 2018-04-24 05:14:54 -0500 )edit

@olarvik, unfortunately, opencv does not have any CUDA ml classes/methods

(there is a cuda based HOG in cudaobjdetect, though)

berak gravatar imageberak ( 2018-04-24 05:51:03 -0500 )edit

@berak, Thank you for the answer! To my regret it is so. I think I can use the method one-against-all for a multi-classification, implementing training on CPU and test on GPU.

olarvik gravatar imageolarvik ( 2018-04-24 06:30:30 -0500 )edit

answered 2017-03-23 08:51:24 -0500

zelade gravatar image

Okay i just made it with your steps. First i extracted the features and trained a Linear SVM Classifier.

To classify i made a python script using sliding windows and then predict the window. First i load the Classifier i created, and then i load the testimage. I downscale the image and iterate. In this iteration i use the sliding window. For each window i calculate the HOG features and use predict. The detections are stored in a list. The detector is working, but i got two problems.

First problem is, that it's very slow. Is there a alternative to sliding windows, because they are very slow? Some kind of contour detection to find the signs? The second problem is, that i receive the following DepricationWarning message:

    Traceback (most recent call last)
    File "", line 79, in <module>
    pred = clf.predict(fd)
  File "/home/pi/.virtualenvs/py2cv3/local/lib/python2.7/site-packages/sklearn/linear_model/", line 336, in predict
    scores = self.decision_function(X)
  File "/home/pi/.virtualenvs/py2cv3/local/lib/python2.7/site-packages/sklearn/linear_model/", line 312, in decision_function
    X = check_array(X, accept_sparse='csr')
  File "/home/pi/.virtualenvs/py2cv3/local/lib/python2.7/site-packages/sklearn/utils/", line 395, in check_array
    DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.

I don't know why it does appear, but it's so annoying, because it shows up for each iteration (>100 times per image).

edit flag offensive delete link more


  1. yea sliding windows are slow. you'd probably need to do this on different scales, too, even more work.

  2. opencv's HOGDescriptor returns a single column. i got no idea about the sklearn SVM (sorry, can't help with it), but for opencv you'd have to reshape it to a single row feature.

berak gravatar imageberak ( 2017-03-23 09:37:25 -0500 )edit

I think realtime is not working well then. Okay thank you i'm trying to extract the HOG features with opencv then.

zelade gravatar imagezelade ( 2017-03-23 09:53:32 -0500 )edit

I just reshaped the features:

fd = hog(image, orientations, pixels_per_cell, ... )
fd = fd.reshape(1, -1)
pred = clf.predict(fd)

This solved the problem for me i think. Thank you for the hint.

zelade gravatar imagezelade ( 2017-03-24 04:57:32 -0500 )edit
Login/Signup to Answer

Question Tools

1 follower


Asked: 2017-03-21 08:51:23 -0500

Seen: 1,595 times

Last updated: Mar 23 '17