OpenCV Q&A Forum - RSS feedhttp://answers.opencv.org/questions/OpenCV answersenCopyright <a href="http://www.opencv.org">OpenCV foundation</a>, 2012-2018.Mon, 04 Mar 2019 15:45:53 -0600ANN_MLP using UPDATE_WEIGHTS to graph error - brokenhttp://answers.opencv.org/question/209794/ann_mlp-using-update_weights-to-graph-error-broken/I ran into this problem while trying to make a learning curve of an MLP I was using to predict 4 output values across 30,000 samples. I wanted to use UPDATE_WEIGHTS to output the error after each training epoch. That way I can graph it and look at trends.
When training the network and setting the termination criteria COUNT=1000 the network received ~5% error. The problem is that when I used UPDATE_WEIGHTS to iteratively train the network 1 epoch at time, the error did not converge to the same value, or with a similar trend.
I provided code below for a simple example that illustrates the same UPDATE_WEIGHTS issue, just so you can clearly see what the problem is. The example uses an MLP to learn how to add two numbers, and compares iteratively training the network using UPDATE_WEIGHTS nEpoch number of times (network1) to retraining the network and using termination criteria COUNT = nEpochs (network2).
OpenCV 4.0.1 <br>
MacBook Pro 64 bit<br>
Eclipse C++<br>
// create train data
int nTrainRows = 1000;
cv::Mat trainMat(nTrainRows, 2, CV_32F);
cv::Mat labelsMat(nTrainRows, 1, CV_32F);
for(int i = 0; i < nTrainRows; i++) {
double rand1 = rand() % 100;
double rand2 = rand() % 100;
trainMat.at<float>(i, 0) = rand1;
trainMat.at<float>(i, 1) = rand2;
labelsMat.at<float>(i, 0) = rand1 + rand2;
}
// create test data
int nTestRows = 100;
cv::Mat testMat(nTestRows, 2, CV_32F);
cv::Mat truthsMat(nTestRows, 1, CV_32F);
for(int i = 0; i < nTestRows; i++) {
double rand1 = rand() % 100;
double rand2 = rand() % 100;
testMat.at<float>(i, 0) = rand1;
testMat.at<float>(i, 1) = rand2;
truthsMat.at<float>(i, 0) = rand1 + rand2;
}
// initialize network1 and set network parameters
cv::Ptr<cv::ml::ANN_MLP > network1 = cv::ml::ANN_MLP::create();
cv::Mat layersMat(1, 2, CV_32SC1);
layersMat.col(0) = cv::Scalar(trainMat.cols);
layersMat.col(1) = cv::Scalar(labelsMat.cols);
network1->setLayerSizes(layersMat);
network1->setActivationFunction(cv::ml::ANN_MLP::ActivationFunctions::SIGMOID_SYM);
network1->setTermCriteria(cv::TermCriteria(cv::TermCriteria::COUNT + cv::TermCriteria::EPS, 1, 0));
cv::Ptr<cv::ml::TrainData> trainData = cv::ml::TrainData::create(trainMat,cv::ml::ROW_SAMPLE,labelsMat,cv::Mat(),cv::Mat(),cv::Mat(),cv::Mat());
network1->train(trainData);
// loop through each epoch, one at a time, and compare error between the two methods
for(int nEpochs = 2; nEpochs <= 20; nEpochs++) {
// train network1 with one more epoch
network1->train(trainData,cv::ml::ANN_MLP::UPDATE_WEIGHTS);
cv::Mat predictions;
network1->predict(testMat, predictions);
double totalError = 0;
for(int i = 0; i < nTestRows; i++)
totalError += abs( truthsMat.at<float>(i, 0) - predictions.at<float>(i, 0) );
double aveError = totalError / (double) nTestRows;
//recreate network2
cv::Ptr<cv::ml::ANN_MLP > network2 = cv::ml::ANN_MLP::create();
network2->setLayerSizes(layersMat);
network2->setActivationFunction(cv::ml::ANN_MLP::ActivationFunctions::SIGMOID_SYM);
network2->setTermCriteria(cv::TermCriteria(cv::TermCriteria::COUNT + cv::TermCriteria::EPS, nEpochs, 0));
// train network2 from scratch, specifying to train with nEpochs
network2->train(trainData);
network2->predict(testMat, predictions);
totalError = 0;
for(int i = 0; i < nTestRows; i++)
totalError += abs( truthsMat.at<float>(i, 0) - predictions.at<float>(i, 0) );
aveError = totalError / (double) nTestRows;
}
I graphed the average error vs the number of training epochs used:
![image description](/upfiles/15518089051345973.png)
You can see that network1 (using UPDATE_WEIGHTS) and network2 (using COUNT) act very differently even though the number of training epochs is the same. The error from network2 converges faster and network1 converges at a higher error. I can not find a reason why this would be the case, as they should be the same?
-TimWreckItTimMon, 04 Mar 2019 15:45:53 -0600http://answers.opencv.org/question/209794/MLP Same Data Different Resultshttp://answers.opencv.org/question/64039/mlp-same-data-different-results/Let Me simplify this question.
If I run opencv MLP train and classify consecutively on the same data, I get different results. Meaning, if I put training a new mlp on the same train data and classifying on the same test data in a for loop, each iteration will give me different results. Even though I am creating a new mlp object each iteration. However, if instead of using a for loop I just run the program a few times, restarting the program after each train and classify; the results are exactly the same.
So question is, does opencv use previous weights, variables, or something of the sorts from other mlp trains? Even though it is not the same mlp object. Anyone know why it does this?
Thanks for the time!
-TimTKJFri, 12 Jun 2015 16:04:14 -0500http://answers.opencv.org/question/64039/MLP sigmoid output +/-epsilonhttp://answers.opencv.org/question/42027/mlp-sigmoid-output-epsilon/This may seem like a duplicate question to [this](http://answers.opencv.org/question/41612/opencv-mlp-with-sigmoid-neurons-output-range/), but the difference is that there I was asking whether the output range is [-1,1] or [0,1]. I have accepted that the range is [0,1] if the the activation function is the sigmoid with alpha != 0 and beta != 0 (as stated in the [documentation](http://docs.opencv.org/modules/ml/doc/neural_networks.html)). Anyway, it seems to me that the output range is more like [0-eps, 1+eps].
My question is: Why is there a small epsilon and how can I turn this off?
One thing I could think of is that the output neurons aren't sigmoid units but linear units. Although it is explicitly stated that all neurons have the same activation function, this could explain this behavior.
Here is a small example that shows what I mean:
#include <iostream>
#include <opencv2/core/core.hpp>
#include <opencv2/ml/ml.hpp>
using namespace cv;
using namespace std;
int main() {
int POS = 1, NEG = 0;
int SAMPLES = 100;
float SPLIT = 0.8;
float C_X = 0.5;
float C_Y = 0.5;
float R = 0.3;
Mat X(SAMPLES, 2, CV_32FC1);
Mat Y(SAMPLES, 1, CV_32FC1);
randu(X, 0, 1);
for(int i = 0; i < SAMPLES; i++){
Y.at<float>(i,0) = pow((X.at<float>(i,0) - C_X),2) + pow((X.at<float>(i,1) - C_Y),2) < pow(R,2) ? POS : NEG;
}
Mat X_train = X(Range(0, (int)(SAMPLES*SPLIT)), Range::all());
Mat Y_train = Y(Range(0, (int)(SAMPLES*SPLIT)), Range::all());
Mat X_test = X(Range((int)(SAMPLES*SPLIT), SAMPLES), Range::all());
Mat Y_test = Y(Range((int)(SAMPLES*SPLIT), SAMPLES), Range::all());
CvANN_MLP_TrainParams params(
cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 1000, 0.000001),
CvANN_MLP_TrainParams::BACKPROP,
0.1,
0.1);
Mat layers = (Mat_<int>(3,1) << 2, 4, 1);
CvANN_MLP net(layers, CvANN_MLP::SIGMOID_SYM, 1, 1);
net.train(X_train, Y_train, Mat(), Mat(), params);
Mat predictions(Y_test.size(), CV_32F);
net.predict(X_test, predictions);
cout << predictions << endl;
Mat error = predictions-Y_test;
multiply(error, error, error);
float mse = sum(error)[0]/error.rows;
cout << "MSE: " << mse << endl;
return 0;
}
For me this produces the following output:
[0.9940818;
0.087859474;
0.072328083;
0.032660298;
-0.0090373717;
0.056480117;
0.13302;
-0.025581671;
0.32763073;
1.0263158;
0.29676101;
0.056798562;
0.070351392;
1.0213233;
0.006240299;
0.96525788;
0.071746305;
1.0048869;
-0.015669812;
0.0023532249]
MSE: 0.0326775
As you can see, there are values just below 0 and above 1.
thomasMon, 15 Sep 2014 06:51:13 -0500http://answers.opencv.org/question/42027/OpenCV MLP with Sigmoid Neurons, Output rangehttp://answers.opencv.org/question/41612/opencv-mlp-with-sigmoid-neurons-output-range/I have searched for answers here on SO and google to the following question, but haven't found anything, so here is my situation:
I want to realize a MLP that learns some similarity function. I have training and test samples and the MLP set up and running. My problem is how to provide the teacher outputs to the net (from which value range).
Here is is the relevant part of my code:
CvANN_MLP_TrainParams params(
cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 1000, 0.000001),
CvANN_MLP_TrainParams::BACKPROP,
0.1,
0.1);
Mat layers = (Mat_<int>(3,1) << FEAT_SIZE, H_NEURONS, 1);
CvANN_MLP net(layers, CvANN_MLP::SIGMOID_SYM, 1, 1);
int iter = net.train(X, Y, Mat(), Mat(), params);
net.predict(X_test, predictions);
The number of input and hidden neurons is set somewhere else and the net has 1 output neuron. X, Y, X_test are Mats containing the training and test samples, no problem here. The problem is, from what value range my Y's have to come and from what value range the predictions will come.
In the [documentation](http://docs.opencv.org/modules/ml/doc/neural_networks.html) I have found the following statements:
For training:
>If you are using the default cvANN_MLP::SIGMOID_SYM activation function then the output should be in the range [-1,1], instead of [0,1], for optimal results.
Since I'm NOT using the default sigmoid function (the one with alpha=0 and beta=0), I'm providing my Y's from [0,1]. Is this right, or do they mean something else with 'default sigmoid function'? Im asking this, because for prediction they explicitly mention alpha and beta:
>If you are using the default cvANN_MLP::SIGMOID_SYM activation function with the default parameter values fparam1=0 and fparam2=0 then the function used is y = 1.7159*tanh(2/3 * x), so the output will range from [-1.7159, 1.7159], instead of [0,1].
Again, since I'm not using the default sigmoid function, I assume to get predictions from [0,1]. Am I right so far?
What is confusing me here is that I've found [another question](http://stackoverflow.com/questions/19140860/neural-network-mlp-trouble/22089428#22089428) regarding the output range of OpenCV's sigmoid function, that says the range has to be [-1,1].
And now comes the real confusion: When I train the net and let it make some predictions, I get values slightly larger than 1 (around 1.03), regardless if my Y's come from [0,1] or [-1,1]. And this shouldn't happen in either case.
Could somebody please enlighten me? Am I missing something here?
Thanks in advance.
**EDIT:**
To make things very clear, I came up with a small example that shows the problem:
#include <iostream>
#include <opencv2/core/core.hpp>
#include <opencv2/ml/ml.hpp>
using namespace cv;
using namespace std;
int main() {
int POS = 1;
int NEG = -1;
int SAMPLES = 100;
float SPLIT = 0.8;
float C_X = 0.5;
float C_Y = 0.5;
float R = 0.3;
Mat X(SAMPLES, 2, CV_32FC1);
Mat Y(SAMPLES, 1, CV_32FC1);
randu(X, 0, 1);
for(int i = 0; i < SAMPLES; i++){
Y.at<float>(i,0) = pow((X.at<float>(i,0) - C_X),2) + pow((X.at<float>(i,1) - C_Y),2) < pow(R,2) ? POS : NEG;
}
Mat X_train = X(Range(0, (int)(SAMPLES*SPLIT)), Range::all());
Mat Y_train = Y(Range(0, (int)(SAMPLES*SPLIT)), Range::all());
Mat X_test = X(Range((int)(SAMPLES*SPLIT), SAMPLES), Range::all());
Mat Y_test = Y(Range((int)(SAMPLES*SPLIT), SAMPLES), Range::all());
CvANN_MLP_TrainParams params(
cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 1000, 0.000001),
CvANN_MLP_TrainParams::BACKPROP,
0.1,
0.1);
Mat layers = (Mat_<int>(3,1) << 2, 4, 1);
CvANN_MLP net(layers, CvANN_MLP::SIGMOID_SYM, 1, 1);
net.train(X_train, Y_train, Mat(), Mat(), params);
Mat predictions(Y_test.size(), CV_32F);
net.predict(X_test, predictions);
cout << predictions << endl;
Mat error = predictions-Y_test;
multiply(error, error, error);
float mse = sum(error)[0]/error.rows;
cout << "MSE: " << mse << endl;
return 0;
}
This code generates a set of random points from a unit square and assignes the labels POS or NEG to them, depending oh whether they are inside the circle given by C_X, C_Y and R. Then a test and a training set are generated and the MLP is trained. Now we have two situations:
1. POS = 1, NEG = -1:
Output is provided to the net as it should be for tanh neurons (from [-1,1]), and I expect predictions from that range. But I also get predictions like -1.018 or 1.052. The mean squared error in this case was 0.13071 for me.
2. POS = 1, NEG = 0:
The output is given like it is said to be optimal (at least I understand the documentation that way). And since I'm not using the default sigmoid function I expect predictions from [0,1]. But I also get values like 1.0263158 and even negative ones. The MSE in this case gets better with 0.0326775.
I know, this example is a classification problem and normally I would just round the values to the closest label, but I want to learn a similarity function and have to rely on the predictions to come from some fixed range.thomasWed, 10 Sep 2014 09:31:33 -0500http://answers.opencv.org/question/41612/