Ask Your Question

MLP Activation Function - train vs predict

asked 2017-09-30 20:05:29 -0600

WreckItTim gravatar image

Hey, so my question was for the developers.

In the docs, it says that training is done using the sigmoid activation function. However, when you use predict it uses y = 1.7159*tanh(2/3 * x).

Can I just have a little clarification please? I don't understand why they train and predict on two different equations. Not only am I just genuinely curious why they are different, but I also need responses to have the range [0,1]. I could normalize the responses, however a better solution is to insure my values in the feature vector are positive (which does not cause a loss in data for my case), this insures the sigmoid responses are between [0,1]. This works for the training side, however not for the testing side since they use different equations and responses will now be between [0,1.7]. I would rather avoid another normalization, if possible.

Thanks for the time! -Tim

edit retag flag offensive close merge delete


"However, when you use predict it uses y = 1.7159*tanh(2/3 * x)." where do you find it ?

LBerger gravatar imageLBerger ( 2017-10-02 01:26:59 -0600 )edit

In the docs, The sigmoid function is described in the top section, then the hyperbolic tangent is explained to be used under he predict section.

WreckItTim gravatar imageWreckItTim ( 2017-10-02 02:01:49 -0600 )edit

Ok I'm not native english speaking but I understand it is a conditional :

"If you are using the default cvANN_MLP::SIGMOID_SYM activation function with the default parameter values fparam1=0 and fparam2=0 then the function used is y = 1.7159*tanh(2/3 * x), so the output will range from [-1.7159, 1.7159], instead of [0,1]."

In source code (opencv 3.3) it is here to defne activation function and predict it's here. it is the same fucntion .

Now if you want to change activation function use [setActivationFunction](http://docs.op...

LBerger gravatar imageLBerger ( 2017-10-02 02:32:51 -0600 )edit

Ya you nailed it. Smart thinking diving into the source code. I posted an answer to this.

WreckItTim gravatar imageWreckItTim ( 2017-10-02 02:48:29 -0600 )edit

1 answer

Sort by ยป oldest newest most voted

answered 2017-10-04 17:55:39 -0600

WreckItTim gravatar image

updated 2017-10-04 17:57:48 -0600

Okay so tanh(x) = sigmoid function = (1+e^(-2x)) / (1-e^(-2x)) so the activation functions are the same. The sigmoid function for training is defined as (beta) * (1+e^(-(alpha)x)) / (1-e^(-(alpha)x)). The predict function is defined as 1.7159*tanh(2/3 * x).

LBerger noticed in the source code that this can change with fparam1 and fparam2, which after some more digging are also alpha and beta, respectfully. By default they are set to zero, however the source code then changes their values to 2/3 and 1.7159 if it is less than FLT_EPSILON which is just a small decimal - so default value of zero means default values of 2/3 and 1.7159.

I'm not sure why they chose these numbers, but if I want to scale my responses between [0,1] then I need to insure all passed in values are positive, and that beta is set to 1.

I would like to set alpha to 2 as this is the original tanh(x) identity. However I am not sure how alpha and the fparam1 (2/3) conversion is made - as in order to maintain the identity with alpha = 2, fparam1 for tanh (the value set to 2/3 by default) should be 1. As seen by tanh(x) = (1+e^(-2x)) / (1-e^(-2x)).

So at some point I think there should be something like alpha = 2 * fparam1, however I do not see this anywhere in the source code, it seems fparam1 = alpha as identity but that is not the equality relationship between hyperbolic tangent and the sigmoid function, is there a reason why?

edit flag offensive delete link more

Question Tools

1 follower


Asked: 2017-09-30 20:05:29 -0600

Seen: 325 times

Last updated: Oct 04 '17