Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

OpenCV MLP with Sigmoid Neurons, Output range

I have searched for answers here on SO and google to the following question, but haven't found anything, so here is my situation:

I want to realize a MLP that learns some similarity function. I have training and test samples and the MLP set up and running. My problem is how to provide the teacher outputs to the net (from which value range).

Here is is the relevant part of my code:

CvANN_MLP_TrainParams params(
    cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 1000, 0.000001),
    CvANN_MLP_TrainParams::BACKPROP,
    0.1,
    0.1);

Mat layers = (Mat_<int>(3,1) << FEAT_SIZE, H_NEURONS, 1);

CvANN_MLP net(layers, CvANN_MLP::SIGMOID_SYM, 1, 1);

int iter = net.train(X, Y, Mat(), Mat(), params);

net.predict(X_test, predictions);

The number of input and hidden neurons is set somewhere else and the net has 1 output neuron. X, Y, X_test are Mats containing the training and test samples, no problem here. The problem is, from what value range my Y's have to come and from what value range the predictions will come.

In the documentation I have found the following statements:

For training:

If you are using the default cvANN_MLP::SIGMOID_SYM activation function then the output should be in the range [-1,1], instead of [0,1], for optimal results.

Since I'm NOT using the default sigmoid function (the one with alpha=0 and beta=0), I'm providing my Y's from [0,1]. Is this right, or do they mean something else with 'default sigmoid function'? Im asking this, because for prediction they explicitly mention alpha and beta:

If you are using the default cvANN_MLP::SIGMOID_SYM activation function with the default parameter values fparam1=0 and fparam2=0 then the function used is y = 1.7159*tanh(2/3 * x), so the output will range from [-1.7159, 1.7159], instead of [0,1].

Again, since I'm not using the default sigmoid function, I assume to get predictions from [0,1]. Am I right so far?

What is confusing me here is that I've found another question regarding the output range of OpenCV's sigmoid function, that says the range has to be [-1,1].

And now comes the real confusion: When I train the net and let it make some predictions, I get values slightly larger than 1 (around 1.03), regardless if my Y's come from [0,1] or [-1,1]. And this shouldn't happen in either case.

Could somebody please enlighten me? Am I missing something here?

Thanks in advance.

EDIT:

To make things very clear, I came up with a small example that shows the problem:

#include <iostream>
#include <opencv2/core/core.hpp>
#include <opencv2/ml/ml.hpp>

using namespace cv;
using namespace std;

int main() {

    int POS = 1;
    int NEG = -1;

    int SAMPLES = 100;
    float SPLIT = 0.8;

    float C_X = 0.5;
    float C_Y = 0.5;
    float R = 0.3;

    Mat X(SAMPLES, 2, CV_32FC1);
    Mat Y(SAMPLES, 1, CV_32FC1);

    randu(X, 0, 1);

    for(int i = 0; i < SAMPLES; i++){
        Y.at<float>(i,0) = pow((X.at<float>(i,0) - C_X),2) + pow((X.at<float>(i,1) - C_Y),2) < pow(R,2) ? POS : NEG;
    }

    Mat X_train = X(Range(0, (int)(SAMPLES*SPLIT)), Range::all());
    Mat Y_train = Y(Range(0, (int)(SAMPLES*SPLIT)), Range::all());

    Mat X_test = X(Range((int)(SAMPLES*SPLIT), SAMPLES), Range::all());
    Mat Y_test = Y(Range((int)(SAMPLES*SPLIT), SAMPLES), Range::all());

    CvANN_MLP_TrainParams params(
                 cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 1000, 0.000001),
                 CvANN_MLP_TrainParams::BACKPROP,
                 0.1,
                 0.1);

    Mat layers = (Mat_<int>(3,1) << 2, 4, 1);

    CvANN_MLP net(layers, CvANN_MLP::SIGMOID_SYM, 1, 1);
    net.train(X_train, Y_train, Mat(), Mat(), params);

    Mat predictions(Y_test.size(), CV_32F); 
    net.predict(X_test, predictions);

    cout << predictions << endl;

    Mat error = predictions-Y_test;
    multiply(error, error, error);

    float mse = sum(error)[0]/error.rows;

    cout << "MSE: " << mse << endl;

    return 0;
}

This code generates a set of random points from a unit square and assignes the labels POS or NEG to them, depending oh whether they are inside the circle given by C_X, C_Y and R. Then a test and a training set are generated and the MLP is trained. Now we have two situations:

  1. POS = 1, NEG = -1:

Output is provided to the net as it should be for tanh neurons (from [-1,1]), and I expect predictions from that range. But I also get predictions like -1.018 or 1.052. The mean squared error in this case was 0.13071 for me.

  1. POS = 1, NEG = 0:

The output is given like it is said to be optimal (at least I understand the documentation that way). And since I'm not using the default sigmoid function I expect predictions from [0,1]. But I also get values like 1.0263158 and even negative ones. The MSE in this case gets better with 0.0326775.

I know, this example is a classification problem and normally I would just round the values to the closest label, but I want to learn a similarity function and have to rely on the predictions to come from some fixed range.