Can't get Logistic Regression results to be anything other than 0's

asked 2020-01-07 13:16:38 -0500

realitysandbox gravatar image

updated 2020-01-07 13:44:14 -0500

Hello, I'm working on an embedded project and I'm currently trying to learn how to use the OpenCV library to do simple logistic regression.

I am testing this on the titanic dataset and I've run into one major issue so far: the results matrix is always set to a vector of zeros after I call logreg->predict(trainData->getTestSamples(), results).

Here is the relevant code:

#include <opencv2/ml.hpp>

using namespace cv;
using namespace ml;
using namespace std;

Ptr<LogisticRegression> model(float learningRate, int iterations, int miniBatchSize) {
    Ptr<LogisticRegression> logreg = LogisticRegression::create();

    logreg->setLearningRate(learningRate);
    logreg->setIterations(iterations);
    logreg->setMiniBatchSize(miniBatchSize);
    logreg->setTrainMethod(LogisticRegression::BATCH);
    logreg->setRegularization(LogisticRegression::REG_L2);

    return logreg;
}

int main(int, char**)
{   

    const Ptr<TrainData> trainData = TrainData::loadFromCSV("data/train_cleaned.csv",
        1, // lines to skip
        0, // index of label
        -1 // 1 response per line
    );

    trainData->setTrainTestSplitRatio(0.8);

    Ptr<LogisticRegression> logreg = model(0.001, 10, 1);

    logreg->train(trainData);

    Mat results;

    logreg->predict(trainData->getTestSamples(), results);

    cout << results.t() << endl;

    return 0;
}

I was thinking that maybe my data wasn't being processed correctly, so I tried to change trainTestSplitRatio to multiple smaller values and verified that the training and testing samples reflected the changes. There was still no difference in the predicted outputs, only a larger vector of zeros.

Example output: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Example input (data/train_cleaned.csv):

# "Survived","Pclass","Sex","Age","SibSp","Parch","Fare"

0.00000,1.00000,0.00000,0.27500,0.20000,0.00000,0.01415

The model is also correctly saved after training:

%YAML:1.0
---
opencv_ml_lr:
   format: 3
   classifier: Logistic Regression Classifier
   alpha: 1.0000000000000000e-03
   iterations: 1000
   norm: 1
   train_method: 0
   learnt_thetas: !!opencv-matrix
      rows: 1
      cols: 7
      dt: f
      data: [ -1.31384659e-04, -1.67027669e-04, 1.30452652e-04,
          -5.84131885e-05, -1.56885471e-05, -3.24403231e-07,
          1.01427850e-05 ]
   n_labels: !!opencv-matrix
      rows: 2
      cols: 1
      dt: i
      data: [ 0, 1 ]
   o_labels: !!opencv-matrix
      rows: 2
      cols: 1
      dt: i
      data: [ 0, 1 ]

Perhaps there are errors in the model.

edit retag flag offensive close merge delete

Comments

1

imho 10 iterations are not enough. try like 5000

can you put the csv somewhere ? (kaggle's data is behind a login wall)

berak gravatar imageberak ( 2020-01-08 05:02:45 -0500 )edit
1

data: http://s000.tinyupload.com/index.php?...

I've tried with more iterations, the issue seems to be that the sigmoid vector returned by calc_sigmoid in the ml/lr.cpp file is always really close to 0.5 but always lower. The more iterations the farther some predictions get from 0.5 towards 0 (still will output zero).

Thank you :)

realitysandbox gravatar imagerealitysandbox ( 2020-01-08 11:01:14 -0500 )edit

test here uses lr=1.0, iter=10001 and batch=10 to solve the iris dataset.

did you try other ml algos, like SVM ?

berak gravatar imageberak ( 2020-01-09 04:45:47 -0500 )edit

Thanks for the response, I managed to get it working in C so the help is not needed anymore.

realitysandbox gravatar imagerealitysandbox ( 2020-01-09 20:50:20 -0500 )edit