I was reading LeCun Efficient Backprop and the author repeated stressed the importance of average the input patterns at 0 and thus justified the usage of tanh sigmoid. But if tanh is good then how come ReLU is very popular in most NNs (which is even more odd when the authors didn't mention about ReLU at all)
Asked
Active
Viewed 142 times
2

nbro
- 39,006
- 12
- 98
- 176

Struggling_In_Final
- 21
- 2
-
1Are you reading the original 1998 paper? That would be my guess as to why ReLU isn't mentioned. – David Hoelzer Jul 05 '22 at 00:02
-
2Does this answer your question? [Choices of activation functions](https://ai.stackexchange.com/questions/36189/choices-of-activation-functions) – Martino Jul 06 '22 at 08:43
1 Answers
2
For a discussion about the advantages of ReLU, see the original paper by Glorot (2011) "Deep sparse rectifier neural networks". "Efficient Backprop" is a 1998 paper. At the time the use of rectifiers was uncommon and sigmoid was the standard choice of activation.

Martino
- 215
- 1
- 5