2

Consider the following paragraph from Numerical Computation of the deep learning book.

When $f'(x) = 0$, the derivative provides no information about which direction to move. Points where $f'(x)$ = 0 are known as critical points, or stationary points. A local minimum is a point where $f(x)$ is lower than at all neighboring points, so it is no longer possible to decrease $f(x)$ by making infinitesimal steps. A local maximum is a point where $f(x)$ is higher than at all neighboring points so it is not possible to increase $f(x)$ by making infinitesimal steps. Some critical points are neither maxima nor minima. These are known as saddle points.

In short, points where $f'(x) =0 $ are called critical points, or stationary points.

But, according to mathematical terminology, the definitions are as follows:

#1: Critical point

A function $y=f(x)$ has critical points at all points $x_0$ where $f'(x_0)=0$ or $f(x)$ is not differentiable.

#2: Stationary point

A point $x_0$ at which the derivative of a function $f(x)$ vanishes, $f'(x_0)=0$. A stationary point may be a minimum, maximum, or inflection point.

It can be noticed that the definitions that are given in the deep learning book do match exactly with stationary points since the only premise is $f'(x)=0$. The definition for critical point is not apt since a critical point can also be a point where $f'(x)$ is nonexistent.

Is there any reason for using the terms critical points and stationary points interchangeably? Is there no need to address the points where $f'(x)$ does not exist?

hanugm
  • 3,571
  • 3
  • 18
  • 50

2 Answers2

1

A critical point of a function $f$ can be

  1. a stationary point (i.e. $f'(x) = 0$), or
  2. a point where the derivative is undefined (for example, in the case of the absolute value function $f(x)$, $x=0$ is a critical point, as $f$ is not differentiable at $x=0$).

So, all stationary points are critical points.

These notes provide more examples of how to find critical points of a function, so they could be useful.

nbro
  • 39,006
  • 12
  • 98
  • 176
  • Is your writing completed? Please address, if possible, the **Is there no need to address the points where f′(x) does not exist?** part, which is causing issue. – hanugm Aug 21 '21 at 00:51
  • @hanugm Are you asking if we can ignore the critical points that are not stationary points? If yes, in which context? – nbro Aug 21 '21 at 01:02
  • yea true, else they may not interchangeably use them. – hanugm Aug 21 '21 at 01:13
  • 1
    @hanugm The point is: you can always refer to a stationary point as a "critical point" , but you cannot always refer to a critical point as a "stationary point". Some cases, where the derivative is undefined, may require a little bit of care, so you don't have to ignore all critical points. I've just read the answer by Taw, and it seems to give an example of where you need to pay attention to functions that are not differentiable at some points (e.g. ReLU). – nbro Aug 21 '21 at 10:11
1

From reading the text, it's clear that the authors are using critical point to mean the same thing as stationary point, so they are not using the proper mathematical definition.

More generally, automatic differentiation will return a gradient at nondifferentiable points. Ask tensorflow or pytorch to take the gradient of $\text{ReLU}(x)\big|_{x=0}$. They return 0, even though it should technically return NaN, since it doesn't exist. Theoretically, the question of how to deal with nonsmooth functions is well understood - see subdifferential for example. In the relu example it actually returns an element of the subdifferential but there are pathological examples where this is not true.

Taw
  • 1,161
  • 3
  • 10