Why is tanh a "smoothly" differentiable function?

Question

The sigmoid, tanh, and ReLU are popular and useful activation functions in the literature.

The following excerpt taken from p4 of Neural Networks and Neural Language Models says that tanh has a couple of interesting properties.

For example, the tanh function has the nice properties of being smoothly differentiable and mapping outlier values toward the mean.

A function is said to be differentiable if it is differentiable at every point in the domain of function. The domain of tanh is $\mathbb{R}$ and $ \dfrac{e^x-e^{-x}}{e^x+e^{-x}}$ is differentiable in $\mathbb{R}$.

But what is meant by "smoothly differentiable" in the case of tanh activation function?

That sounds like a pretty subjective description. I would argue sigmoid is also smoothly differentiable because the gradient function is predictable and symmetric. It could maybe be referring to how the tanh derivative exists in one of 3 states: Essentially stationary, linear, or a transition between the two. — Recessive, Jul 08 '21 at 09:30
In math, as far as I know, "smooth" often refers to the fact that the function can be differentiated multiple times. See [this](https://en.wikipedia.org/wiki/Smoothness). To be honest, I never really thought about the properties of the hyperbolic tangent, so I don't know if the author of that excerpt is referring to something else. — nbro, Jul 08 '21 at 10:51

score 4 · Accepted Answer · answered Jul 08 '21 at 13:38

A smooth function is usually defined to be a function that is $n$-times continuously differentiable, which means that $f$, $f'$, $\dots$, $f^{(n - 1)}$ are all differentiable and $f^{(n)}$ is continuous. Such functions are also called $C^n$ functions.

It can be a bit of a vague term; some people might even stretch the definition and say any continuous function is smooth (though I'd be a little surprised if I saw that in use, personally). Other people write smooth to mean infinitely differentiable: for example $f(x) = e^x$ can be differentiated as many times as you like.

I guess what the author is trying to point out is that the ReLU rectifier function isn't differentiable. Even if you use the "trick"¹ of treating ReLU as differentiable everywhere, you would still get a derivative that is discontinuous:

$\mathrm{ReLU}'(x) = \begin{cases} 1 & x \ge 0 \\ 0 & \text{otherwise.} \end{cases}$

So, it's fair to say that ReLU isn't smooth in the same sense of the $\tanh$ function, which has a continuous derivative (and, in fact, you could carry on and consider the higher derivatives).

¹ If this doesn't sound familiar, see p. 188 of Deep Learning by Bengio et al. We can get around the fact that ReLU functions aren't differentiable at zero by just pretending it has a well-defined derivative of zero or one. A little dishonest, perhaps, but it works very well.

Why is tanh a "smoothly" differentiable function?

1 Answers1

Linked