Revision history [back]

Can't get Logistic Regression results to be anything other than 0's

Hello, I'm working on an embedded project and I'm currently trying to learn how to use the OpenCV library to do simple logistic regression.

I am testing this on the titanic dataset and I've run into one major issue so far: the results matrix is always set to a vector of zeros after I call logreg->predict(trainData->getTestSamples(), results).

Here is the relevant code:

#include <opencv2/ml.hpp>

using namespace cv;
using namespace ml;
using namespace std;

Ptr<LogisticRegression> model(float learningRate, int iterations, int miniBatchSize) {
    Ptr<LogisticRegression> logreg = LogisticRegression::create();

    logreg->setLearningRate(learningRate);
    logreg->setIterations(iterations);
    logreg->setMiniBatchSize(miniBatchSize);
    logreg->setTrainMethod(LogisticRegression::BATCH);
    logreg->setRegularization(LogisticRegression::REG_L2);

    return logreg;
}

int main(int, char**)
{   

    const Ptr<TrainData> trainData = TrainData::loadFromCSV("data/train_cleaned.csv",
        1, // lines to skip
        0, // index of label
        -1 // 1 response per line
    );

    trainData->setTrainTestSplitRatio(0.8);

    Ptr<LogisticRegression> logreg = model(0.001, 10, 1);

    logreg->train(trainData);

    Mat results;

    logreg->predict(trainData->getTestSamples(), results);

    cout << results.t() << endl;

    return 0;
}

I was thinking that maybe my data wasn't being processed correctly, so I tried to change trainTestSplitRatio to multiple smaller values and verified that the training and testing samples reflected the changes. There was still no difference in the predicted outputs, only a larger vector of zeros.

Example output: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Example input (data/train_cleaned.csv):

# "Survived","Pclass","Sex","Age","SibSp","Parch","Fare"

0.00000,1.00000,0.00000,0.27500,0.20000,0.00000,0.01415