Highest Voted 'relu' Questions - Artificial Intelligence Stack Exchange

22

votes

1 answer

What are the advantages of ReLU vs Leaky ReLU and Parametric ReLU (if any)?

I think that the advantage of using Leaky ReLU instead of ReLU is that in this way we cannot have vanishing gradient. Parametric ReLU has the same advantage with the only difference that the slope of the output for negative inputs is a learnable…

neural-networks activation-functions relu

asked Jul 24 '18 at 12:47

gvgramazio

696
2
7
19

13

votes

1 answer

How exactly can ReLUs approximate non-linear and curved functions?

Currently, the most commonly used activation functions are ReLUs. So I answered this question What is the purpose of an activation function in neural networks? and, while writing the answer, it struck me, how exactly can ReLUs approximate a…

neural-networks machine-learning activation-functions relu

asked Mar 09 '18 at 10:45

user9947

11

votes

2 answers

Why do we prefer ReLU over linear activation functions?

The ReLU activation function is defined as follows $$y = \operatorname{max}(0,x)$$ And the linear activation function is defined as follows $$y = x$$ The ReLU nonlinearity just clips the values less than 0 to 0 and passes everything else. Then why…

neural-networks deep-learning comparison activation-functions relu

asked May 19 '18 at 16:41

imflash217

499
4
14

10

votes

3 answers

Are ReLUs incapable of solving certain problems?

Background I've been interested in and reading about neural networks for several years, but I haven't gotten around to testing them out until recently. Both for fun and to increase my understanding, I tried to write a class library from scratch in…

neural-networks activation-functions function-approximation relu sigmoid

asked Nov 17 '16 at 20:46

Benjamin Chambers

221
1
8

9

votes

1 answer

What happens when I mix activation functions?

There are several activation functions, such as ReLU, sigmoid or $\tanh$. What happens when I mix activation functions? I recently found that Google has developed Swish activation function which is (x*sigmoid). By altering activation function can it…

neural-networks machine-learning activation-functions relu sigmoid

asked Jan 04 '19 at 13:39

JSChang

93
1
6

5

votes

2 answers

Why is tf.abs non-differentiable in Tensorflow?

I understand why tf.abs is non-differentiable in principle (discontinuity at 0) but the same applies to tf.nn.relu yet, in case of this function gradient is simply set to 0 at 0. Why the same logic is not applied to tf.abs? Whenever I tried to use…

tensorflow backpropagation relu gradient

asked Feb 17 '21 at 17:29

zedsdead

53
3

5

votes

2 answers

In deep learning, is it possible to use discontinuous activation functions?

In deep learning, is it possible to use discontinuous activation functions (e.g. one with jump discontinuity)? (My guess: for example, ReLU is non-differentiable at a single point, but it still has a well-defined derivative. If an activation…

deep-learning backpropagation optimization activation-functions relu

asked Jan 22 '20 at 04:40

Gyeonghoon Ko

51
2

4

votes

1 answer

Why should one ever use ReLU instead of PReLU?

To me, it seems that PReLU is strictly better than ReLU. It does not have the dying ReLU problem, it allows negative values and it has trainable parameters (which are computationally negligible to adjust). Only if we want the network to output…

deep-learning comparison activation-functions relu prelu

asked Sep 10 '21 at 11:42

algebruh

41
1

4

votes

0 answers

Should batch normalisation be applied before or after ReLU?

I know that there has been some discussion about this (e.g. here and here), but I can't seem to find consensus. The crucial thing that I haven't seen mentioned in these discussions is that applying batch normalization before ReLU switches off half…

neural-networks deep-learning relu batch-normalization

asked Jul 01 '20 at 07:24

Kris

171
5

4

votes

1 answer

Neural network doesn't seem to converge with ReLU but it does with Sigmoid?

I'm not really sure if this is the sort of question to ask on here, since it is less of a general question about AI and more about the coding of it, however I thought it wouldn't fit on stack overflow. I have been programming a multilayer perceptron…

convergence relu c++ sigmoid

asked Apr 15 '20 at 18:37

finlay morrison

151
3

4

votes

2 answers

Is PReLU superfluous with respect to ReLU?

Why do people use the $PReLU$ activation? $PReLU[x] = ReLU[x] + ReLU[p*x]$ with the parameter $p$ typically being a small negative number. If a fully connected layer is followed by a at least two element $ReLU$ layer then the combined layers…

neural-networks machine-learning activation-functions relu

asked Jul 23 '19 at 13:54

Robert Nowak

143
5

3

votes

1 answer

Can residual neural networks use other activation functions different from ReLU?

In many diagrams, as seen below, residual neural networks are only depicted with ReLU activation functions, but can residual NNs also use other activation functions, such as the sigmoid, hyperbolic tangent, etc.?

neural-networks convolutional-neural-networks activation-functions relu residual-networks

asked Jul 23 '20 at 06:14

jr123456jr987654321

235
1
7

3

votes

1 answer

How are exploding numbers in a forward pass of a CNN combated?

Take AlexNet for example: In this case, only the activation function ReLU is used. Due to the fact ReLU cannot be saturated, it instead explodes, like in the following example: Say I have a weight matrix of [-1,-2,3,4] and inputs of [ReLU(4),…

machine-learning convolutional-neural-networks relu

asked Jun 28 '19 at 02:59

Recessive

1,346
8
21

3

votes

1 answer

How does backpropagation with unbounded activation functions such as ReLU work?

I am in the process of writing my own basic machine learning library in Python as an exercise to gain a good conceptual understanding. I have successfully implemented backpropagation for activation functions such as $\tanh$ and the sigmoid function.…

neural-networks backpropagation relu

asked Feb 16 '19 at 00:21

Archie Shahidullah

368
1
7

2

votes

2 answers

Why do non-linear activation functions that produce values larger than 1 or smaller than 0 work?

Why do non-linear activation functions that produce values larger than 1 or smaller than 0 work? My understanding is that neurons can only produce values between 0 and 1, and that this assumption can be used in things like cross-entropy. Are my…

neural-networks reference-request activation-functions relu cross-entropy

asked Jan 09 '18 at 14:54

Emil Wormbs

59
3

Questions tagged [relu]