The sigmoid, tanh, and ReLU are popular and useful activation functions in the literature.
The following excerpt taken from p4 of Neural Networks and Neural Language Models says that tanh
has a couple of interesting properties.
For example, the tanh function has the nice properties of being smoothly differentiable and mapping outlier values toward the mean.
A function is said to be differentiable if it is differentiable at every point in the domain of function. The domain of tanh
is $\mathbb{R}$ and $ \dfrac{e^x-e^{-x}}{e^x+e^{-x}}$ is differentiable in $\mathbb{R}$.
But what is meant by "smoothly differentiable" in the case of tanh
activation function?