For questions related to the selection of and theory behind specific activation functions used in artificial networks.
Questions tagged [activation-functions]
162 questions
40
votes
5 answers
What is the purpose of an activation function in neural networks?
It is said that activation functions in neural networks help introduce non-linearity.
What does this mean?
What does non-linearity mean in this context?
How does the introduction of this non-linearity help?
Are there any other purposes of…

Mohsin
- 972
- 1
- 9
- 15
24
votes
3 answers
How to choose an activation function for the hidden layers?
I choose the activation function for the output layer depending on the output that I need and the properties of the activation function that I know. For example, I choose the sigmoid function when I'm dealing with probabilities, a ReLU when I'm…

gvgramazio
- 696
- 2
- 7
- 19
22
votes
1 answer
What are the advantages of ReLU vs Leaky ReLU and Parametric ReLU (if any)?
I think that the advantage of using Leaky ReLU instead of ReLU is that in this way we cannot have vanishing gradient. Parametric ReLU has the same advantage with the only difference that the slope of the output for negative inputs is a learnable…

gvgramazio
- 696
- 2
- 7
- 19
19
votes
4 answers
What activation function does the human brain use?
Does the human brain use a specific activation function?
I've tried doing some research, and as it's a threshold for whether the signal is sent through a neuron or not, it sounds a lot like ReLU. However, I can't find a single article confirming…

mlman
- 301
- 2
- 5
17
votes
2 answers
Are softmax outputs of classifiers true probabilities?
BACKGROUND: The softmax function is the most common choice for an activation function for the last dense layer of a multiclass neural network classifier. The outputs of the softmax function have mathematical properties of probabilities and are--in…

Snehal Patel
- 912
- 1
- 1
- 25
15
votes
4 answers
Why do activation functions need to be differentiable in the context of neural networks?
Why should an activation function of a neural network be differentiable? Is it strictly necessary or is it just advantageous?
user3642
13
votes
1 answer
How exactly can ReLUs approximate non-linear and curved functions?
Currently, the most commonly used activation functions are ReLUs. So I answered this question What is the purpose of an activation function in neural networks? and, while writing the answer, it struck me, how exactly can ReLUs approximate a…
user9947
12
votes
2 answers
What does it mean for a neuron in a neural network to be activated?
I just stumbled upon the concept of neuron coverage, which is the ratio of activated neurons and total neurons in a neural network. But what does it mean for a neuron to be "activated"? I know what activation functions are, but what does being…

Leon
- 173
- 7
12
votes
3 answers
Why is the derivative of the activation functions in neural networks important?
I'm new to NN. I am trying to understand some of its foundations. One question that I have is: why the derivative of an activation function is important (not the function itself), and why it's the derivative which is tied to how the network performs…

Tina J
- 973
- 6
- 13
11
votes
2 answers
Why do we prefer ReLU over linear activation functions?
The ReLU activation function is defined as follows
$$y = \operatorname{max}(0,x)$$
And the linear activation function is defined as follows
$$y = x$$
The ReLU nonlinearity just clips the values less than 0 to 0 and passes everything else. Then why…

imflash217
- 499
- 4
- 14
10
votes
3 answers
Are ReLUs incapable of solving certain problems?
Background
I've been interested in and reading about neural networks for several years, but I haven't gotten around to testing them out until recently.
Both for fun and to increase my understanding, I tried to write a class library from scratch in…

Benjamin Chambers
- 221
- 1
- 8
9
votes
1 answer
What happens when I mix activation functions?
There are several activation functions, such as ReLU, sigmoid or $\tanh$. What happens when I mix activation functions?
I recently found that Google has developed Swish activation function which is (x*sigmoid). By altering activation function can it…

JSChang
- 93
- 1
- 6
8
votes
1 answer
What's the advantage of log_softmax over softmax?
Previously I have learned that the softmax as the output layer coupled with the log-likelihood cost function (the same as the the nll_loss in pytorch) can solve the learning slowdown problem.
However, while I am learning the pytorch mnist tutorial,…

user1024
- 181
- 2
8
votes
1 answer
Why isn't the ElliotSig activation function widely used?
The Softsign (a.k.a. ElliotSig) activation function is really simple:
$$ f(x) = \frac{x}{1+|x|} $$
It is bounded $[-1,1]$, has a first derivative, it is monotonic, and it is computationally extremely simple (easy for, e.g., a GPU).
Why it is not…

Pietro
- 183
- 1
- 8
7
votes
1 answer
What makes multi-layer neural networks able to perform nonlinear operations?
As I know, a single layer neural network can only do linear operations, but multilayered ones can.
Also, I recently learned that finite matrices/tensors, which are used in many neural networks, can only represent linear operations.
However,…

KYHSGeekCode
- 173
- 6