Why and when do we use ReLU over tanh activation function?

Question

I was reading LeCun Efficient Backprop and the author repeated stressed the importance of average the input patterns at 0 and thus justified the usage of tanh sigmoid. But if tanh is good then how come ReLU is very popular in most NNs (which is even more odd when the authors didn't mention about ReLU at all)

Are you reading the original 1998 paper? That would be my guess as to why ReLU isn't mentioned. — David Hoelzer, Jul 05 '22 at 00:02
Does this answer your question? [Choices of activation functions](https://ai.stackexchange.com/questions/36189/choices-of-activation-functions) — Martino, Jul 06 '22 at 08:43

score 2 · Answer 1 · answered Jul 06 '22 at 08:11

For a discussion about the advantages of ReLU, see the original paper by Glorot (2011) "Deep sparse rectifier neural networks". "Efficient Backprop" is a 1998 paper. At the time the use of rectifiers was uncommon and sigmoid was the standard choice of activation.

Why and when do we use ReLU over tanh activation function?

1 Answers1