Hello, I'm working on an embedded project and I'm currently trying to learn how to use the OpenCV library to do simple logistic regression.
I am testing this on the titanic dataset and I've run into one major issue so far: the results
matrix is always set to a vector of zeros after I call logreg->predict(trainData->getTestSamples(), results)
.
Here is the relevant code:
#include <opencv2/ml.hpp>
using namespace cv;
using namespace ml;
using namespace std;
Ptr<LogisticRegression> model(float learningRate, int iterations, int miniBatchSize) {
Ptr<LogisticRegression> logreg = LogisticRegression::create();
logreg->setLearningRate(learningRate);
logreg->setIterations(iterations);
logreg->setMiniBatchSize(miniBatchSize);
logreg->setTrainMethod(LogisticRegression::BATCH);
logreg->setRegularization(LogisticRegression::REG_L2);
return logreg;
}
int main(int, char**)
{
const Ptr<TrainData> trainData = TrainData::loadFromCSV("data/train_cleaned.csv",
1, // lines to skip
0, // index of label
-1 // 1 response per line
);
trainData->setTrainTestSplitRatio(0.8);
Ptr<LogisticRegression> logreg = model(0.001, 10, 1);
logreg->train(trainData);
Mat results;
logreg->predict(trainData->getTestSamples(), results);
cout << results.t() << endl;
return 0;
}
I was thinking that maybe my data wasn't being processed correctly, so I tried to change trainTestSplitRatio to multiple smaller values and verified that the training and testing samples reflected the changes. There was still no difference in the predicted outputs, only a larger vector of zeros.
Example output: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Example input (data/train_cleaned.csv):
# "Survived","Pclass","Sex","Age","SibSp","Parch","Fare"
0.00000,1.00000,0.00000,0.27500,0.20000,0.00000,0.01415