We can enforce some constraints on functions used in deep learning in order to guarantee optimizations. You can find it in Numerical Computation of the deep learning book.
In the context of deep learning, we sometimes gain some guarantees by restricting ourselves to functions that are either Lipschitz continuous or have Lipschitz continuous derivatives.
They include
- Lipschitz continuous functions
- Having Lipschitz continuous derivatives
The definition given for Lipschitz continuous function is as follows
A Lipschitz continuous function is a function $f$ whose rate of change is bounded by a Lipschitz constant $\mathcal{L}$:
$$\forall x, \forall y, |f(x)-f(y)| \le \mathcal{L} \|x-y\|_2 $$
Now, what is meant by having Lipschitz continuous derivatives?
Does they refer to the derivatives of Lipschitz continuous functions? If yes, then why do they mention it as a separate option?