Help Training Binary Logistic Regression Classifier on HOG Features

asked 2018-09-15 14:50:56 -0500

9261 gravatar image

updated 2018-09-16 15:07:02 -0500

Hello! I am trying to train a binary logistic regression classifier on hog features calculated from opencv's internal implementation of the hog descriptor. I am using OpenCV's java bindings. This is my first time trying to train a machine learning model with OpenCV. The model seems to train correctly, but when I try to test the model on some of the data I collected, I get a matrix that seems to be much larger than it should be and full of empty spaces. I would appreciate any help you could give me with this! I got most of my code from this question. Here is my Code:

private static boolean hasGottenSize = false;
private static Size hogSize = new Size(1,1);
public static void main(String[] args){

    String DATABASEPos = "C:\\TrainingArena\\yalefaces\\yaleB11\\t";
    String DATABASENeg = "C:\\TrainingArena\\neg";

    //List initialization
    ArrayList<Integer> training_labels_array = new ArrayList<>();
    ArrayList<Integer> testing_labels_array = new ArrayList<>();
    Mat TRAINING_DATA = new Mat();
    Mat TESTING_DATA = new Mat();

    // Load training and testing data
    File[] posDirectories = new File(DATABASEPos).listFiles();
    File[] negDirectories = new File(DATABASENeg).listFiles();

    HOGDescriptor hog = new HOGDescriptor(new Size(160,120),new Size(80,60),new Size(40,30), new Size(80,60),8);

    int numImages = 100;
    int negLabel = 0;
    int posLabel = 1;
    Size winStride = new Size(128,128);
    Size padding = new Size(8,8);


    TRAINING_DATA = TRAINING_DATA.reshape(0,(int) hogSize.height);

    // Put training and testing labels into Mats
    Mat TRAINING_LABELS = Mat.zeros(1, TRAINING_DATA.cols(), CvType.CV_32SC1);
    for(int i = 0; i < training_labels_array.size(); i++){
        TRAINING_LABELS.put(0, i, training_labels_array.get(i));
    Mat TESTING_LABELS = Mat.zeros(TESTING_DATA.rows(), 1, CvType.CV_32SC1);
    for(int i = 0; i < testing_labels_array.size(); i++){
        TESTING_LABELS.put(i, 0, testing_labels_array.get(i));


    System.out.println("TRAINING_DATA - Rows:" + TRAINING_DATA.rows() + " Cols:" + TRAINING_DATA.cols());
    System.out.println("TRAINING_LABELS - Rows:" + TRAINING_LABELS.rows() + " Cols:" + TRAINING_LABELS.cols());
    System.out.println("TESTING_DATA - Rows:" + TESTING_DATA.rows() + " Cols:" + TESTING_DATA.cols());
    System.out.println("TESTING_LABELS - Rows:" + TESTING_LABELS.rows() + " Cols:" + TESTING_LABELS.cols());

    // Train
    LogisticRegression log = LogisticRegression.create();



    Mat RESULTS = new Mat();
    int flags = 0;
    log.predict(TESTING_DATA, RESULTS, flags);

private static void loadData(File[] directory, HOGDescriptor hog, int label, Mat data, List<Integer> labels_array, int numImages, Size winStride, Size padding) {
    for(int i = 0; i < numImages; i++){
        Mat image = Imgcodecs.imread(directory[i].getAbsolutePath(),Imgcodecs.IMREAD_UNCHANGED); //for each file, read the image
        MatOfFloat training_feature = new MatOfFloat();
        MatOfPoint locations = new MatOfPoint();
        if(!hasGottenSize) {
            hogSize = training_feature.size();
            hasGottenSize = true;
edit retag flag offensive close merge delete


how many images do you have ? what is the final HOG feature size here ?

no answer, just some remarks:

  • LogisticRegression might be the wrong classifier here. (to my exp. it does not like "long" features)
  • an empty winstride / padding will result in huge HOG features
  • please write a function to extract your data, don't replicate the same code 4 times
  • you will receive a Mat with NumTestFeatures rows and 1 col from predict()
berak gravatar imageberak ( 2018-09-16 01:35:04 -0500 )edit

Hi @berak! I am training on 100 positive images and 100 negative images and have a hog feature size of 864. Each image is 640px by 480px. I have made the changes you suggested (making a training function and adding a non-null padding and winstride to reduce the feature length). I am trying to use logistic regression because I have heard that an already trained model can be loaded and improved by adding more data without having to completely re-train it. I have also changed the code so that each column of the training data is a data sample. Thank you for helping me with this!

9261 gravatar image9261 ( 2018-09-16 15:39:04 -0500 )edit

so, nothing really wrong with your code (apart from the clumsy repetition)

and 864 features arent that long (for comparison, ~2.5k are used for the "people" detection withSVM)

and yes, LogisticRegression (and also ANN_MLP) can be retrained ( or trained on consecutive, small batches)

so i think: the main problem is the data. 100 positives/negatives might simply not be enough.

berak gravatar imageberak ( 2018-09-16 18:19:09 -0500 )edit

Ok @berak, I got it to train by adding more data and changing the data format a little, but now when I test it it returns a label of 1.0 for every entry in the testing data (Also, sorry about the ugly code. I wanted to get something working relatively quickly before trying to make it nice)

9261 gravatar image9261 ( 2018-09-17 23:46:48 -0500 )edit

ok, nice change !

one thing you could try is to normalize the data in your loadData() function, like:

 Core.normalize(training_feature, training_feature);
berak gravatar imageberak ( 2018-09-17 23:58:25 -0500 )edit

if that still doe not work, hmmm. fall back to using an ANN_MLP ? main difference is: it would need "one-hot" encoded labels:

Mat labels = Mat.zeros(nImages, 2, CvType.CV_32F);

labels.put(i, label) = 1;

(you have 2 colums, and put a 1, where the label is, like):

1 0
0 1
1 0
berak gravatar imageberak ( 2018-09-18 00:03:33 -0500 )edit