ANN_MLP using UPDATE_WEIGHTS to graph error - broken

asked 2019-03-04 15:45:53 -0500

WreckItTim gravatar image

updated 2019-03-05 12:23:32 -0500

I ran into this problem while trying to make a learning curve of an MLP I was using to predict 4 output values across 30,000 samples. I wanted to use UPDATE_WEIGHTS to output the error after each training epoch. That way I can graph it and look at trends.

When training the network and setting the termination criteria COUNT=1000 the network received ~5% error. The problem is that when I used UPDATE_WEIGHTS to iteratively train the network 1 epoch at time, the error did not converge to the same value, or with a similar trend.

I provided code below for a simple example that illustrates the same UPDATE_WEIGHTS issue, just so you can clearly see what the problem is. The example uses an MLP to learn how to add two numbers, and compares iteratively training the network using UPDATE_WEIGHTS nEpoch number of times (network1) to retraining the network and using termination criteria COUNT = nEpochs (network2).

OpenCV 4.0.1
MacBook Pro 64 bit
Eclipse C++

 // create train data
 int nTrainRows = 1000;
 cv::Mat trainMat(nTrainRows, 2, CV_32F);
 cv::Mat labelsMat(nTrainRows, 1, CV_32F);
 for(int i = 0; i < nTrainRows; i++) {
     double rand1 = rand() % 100;
     double rand2 = rand() % 100;
     trainMat.at<float>(i, 0) = rand1;
     trainMat.at<float>(i, 1) = rand2;
     labelsMat.at<float>(i, 0) = rand1 + rand2;
 }

 // create test data
 int nTestRows = 100;
 cv::Mat testMat(nTestRows, 2, CV_32F);
 cv::Mat truthsMat(nTestRows, 1, CV_32F);
 for(int i = 0; i < nTestRows; i++) {
     double rand1 = rand() % 100;
     double rand2 = rand() % 100;
     testMat.at<float>(i, 0) = rand1;
     testMat.at<float>(i, 1) = rand2;
     truthsMat.at<float>(i, 0) = rand1 + rand2;
 }

 // initialize network1 and set network parameters
 cv::Ptr<cv::ml::ANN_MLP > network1 = cv::ml::ANN_MLP::create();
 cv::Mat layersMat(1, 2, CV_32SC1);
 layersMat.col(0) = cv::Scalar(trainMat.cols);
 layersMat.col(1) = cv::Scalar(labelsMat.cols);
 network1->setLayerSizes(layersMat);
 network1->setActivationFunction(cv::ml::ANN_MLP::ActivationFunctions::SIGMOID_SYM);
 network1->setTermCriteria(cv::TermCriteria(cv::TermCriteria::COUNT + cv::TermCriteria::EPS, 1, 0));
 cv::Ptr<cv::ml::TrainData> trainData = cv::ml::TrainData::create(trainMat,cv::ml::ROW_SAMPLE,labelsMat,cv::Mat(),cv::Mat(),cv::Mat(),cv::Mat());
 network1->train(trainData);

 // loop through each epoch, one at a time, and compare error between the two methods
 for(int nEpochs = 2; nEpochs <= 20; nEpochs++) {
      // train network1 with one more epoch
      network1->train(trainData,cv::ml::ANN_MLP::UPDATE_WEIGHTS);
      cv::Mat predictions;
      network1->predict(testMat, predictions);
      double totalError = 0;
      for(int i = 0; i < nTestRows; i++)
          totalError += abs( truthsMat.at<float>(i, 0) - predictions.at<float>(i, 0) );
      double aveError = totalError / (double) nTestRows;

      //recreate network2 
      cv::Ptr<cv::ml::ANN_MLP > network2 = cv::ml::ANN_MLP::create();
      network2->setLayerSizes(layersMat);
      network2->setActivationFunction(cv::ml::ANN_MLP::ActivationFunctions::SIGMOID_SYM);
      network2->setTermCriteria(cv::TermCriteria(cv::TermCriteria::COUNT + cv::TermCriteria::EPS, nEpochs, 0));

      // train network2 from scratch, specifying to train with nEpochs
      network2->train(trainData);
      network2->predict(testMat, predictions);
      totalError = 0;
      for(int i = 0; i < nTestRows; i++) 
          totalError += abs( truthsMat.at<float>(i, 0) - predictions.at<float>(i, 0) );
      aveError = totalError / (double) nTestRows;
 }

I graphed the average error vs the number of training epochs used: image description

You can see that network1 (using ... (more)

edit retag flag offensive close merge delete

Comments

1

Have you tried to do the XOR problem?

sjhalayka gravatar imagesjhalayka ( 2019-03-04 17:15:46 -0500 )edit
1

Can you try with those line for seed ?

LBerger gravatar imageLBerger ( 2019-03-05 02:24:02 -0500 )edit
1

It’s the same results if I comment out the random seed lines.

WreckItTim gravatar imageWreckItTim ( 2019-03-05 02:25:53 -0500 )edit
1

What would be the point of trying the XOR problem?

WreckItTim gravatar imageWreckItTim ( 2019-03-05 02:27:38 -0500 )edit
1

Let's try to explain (with my bad englisg) : in source code data are shufled when idx ==0 and it depends of iter. when you set iter to 1 or nEpochs you don't shuffle datasame time? is it correct ?

LBerger gravatar imageLBerger ( 2019-03-05 02:36:27 -0500 )edit
1

Shuffling data differently shouldn’t have that big of an effect on how quickly the error converges?

WreckItTim gravatar imageWreckItTim ( 2019-03-05 02:39:28 -0500 )edit
1

Your network is random network. It's a regression problem isn't it?

LBerger gravatar imageLBerger ( 2019-03-05 02:42:13 -0500 )edit
1

Yes it’s a regression problem. The training data is still the same for both network1 and network2. Shouldn’t the randomness balance out after doing multiple epochs (each epoch reshuffles the data)

WreckItTim gravatar imageWreckItTim ( 2019-03-05 02:51:09 -0500 )edit
1

try with :

network1->setTermCriteria(cv::TermCriteria(cv::TermCriteria::COUNT + cv::TermCriteria::EPS, 10, 0));

and

network2->setTermCriteria(cv::TermCriteria(cv::TermCriteria::COUNT + cv::TermCriteria::EPS, nEpochs*10, 0));

and nTestRows=3 what is your question? backpropagation uses a way to find a solution but this way depends of data

LBerger gravatar imageLBerger ( 2019-03-05 03:20:20 -0500 )edit

Hey LBerger I rewrote the question to be more clear what the problem is.

WreckItTim gravatar imageWreckItTim ( 2019-03-05 12:28:20 -0500 )edit