1

Is the role played by activation function significant only during the training of neural network or they play their role during testing (after training we supply data for prediction) the network.

I understand that a linear line cannot separate data scattered in complex manner but Then why we don't used simple polynomials.

why specifically sigmoid, or tanh or ReLu what exactly they are doing ?

What Activation functions do when we are supplying data during training and

And when we supply test data once we have trained the network and we input test data for prediction?

Harshal
  • 167
  • 4
  • You're asking too many questions in the same post. – nbro Mar 03 '19 at 09:33
  • 2
    Possible duplicate of [What is the purpose of an activation function in neural networks?](https://ai.stackexchange.com/questions/5493/what-is-the-purpose-of-an-activation-function-in-neural-networks) – nbro Oct 29 '19 at 11:43

1 Answers1

2

Activation function is a non-linear function. Operation in a neuron without activation function is just a linear function. If we don't put activation function between operation of neurons, then the function "Layer" is useless.

for example if you have two layer network, when you are doing forward-propagation, your output (without activation function) of your first layer will be calculated as:

$O_1 = W_1X+b_1 $

Then your output of your second layer will be:

$O_2 = W_2O_1+b_2 $

If we substitute $O_1$, so the output of your second layer can be calculated as:

$O_2 = W_2(W_1X+b_1)+b_2 $

or simply

$O_2 = W_2W_1X+W_2b_1+b_2 $

As we train neural network to optimize the value of $W$ and $b$ (we train to find the best value of it) so instead of training neural network with two layers, we actually just train a one layer network. From the latter formula we can said $W_2W_1 = W_3$ and $W_2b_1+b_2 = b_3$ so our two layer network is just another linear model:

$O_2 = W_3X+b_3 $

We don't want that, we add layers to get more complex model. That's why we use Activation function that is non-linear function. To prevent our deep model is just become a simple linear function.

malioboro
  • 2,729
  • 3
  • 20
  • 46
  • The answer has very little less relevance in context of the question which itself is unclear...So if you want to answer a question definitively I suggest you first clear the question with the OP. –  Feb 01 '19 at 07:46
  • @DuttaA that's true, I'm sorry for answered it too hastily, after re-read the question I realize the question is unclear, now I'll wait additional explanation from OP – malioboro Feb 01 '19 at 07:52
  • I understand that a linear line cannot separate data scattered in complex manner but Then why we don't use simple polynomials. why specifically sigmoid, or tanh or ReLu what exactly they are doing ? – Harshal Feb 01 '19 at 08:16
  • 1
    @user193118 I have answered a question about activations, but for a true understanding you have to dig through datascience.se and ai.se both....There are​ a lot of similar questions which will clear your doubt about why use sigmoid/tanh and not other polynomial. –  Feb 01 '19 at 08:20