3

Consider the following description regarding gradient clipping in PyTorch

torch.nn.utils.clip_grad_norm_(parameters, max_norm, norm_type=2.0, error_if_nonfinite=False)

Clips gradient norm of an iterable of parameters.

The norm is computed over all gradients together as if they were concatenated into a single vector. Gradients are modified in-place.

Let the weights and gradients, for loss function $L$, of the model, be given as below

\begin{align} w &= [w_1, w_2, w_3, \cdots, w_n] \\ \triangledown &= [\triangledown_1, \triangledown_2, \triangledown_3, \cdots, \triangledown_n] \text{, where } \triangledown_i = \dfrac{\partial L}{\partial w_i} \text{ and } 1 \le i \le n \end{align}

From the description, we need to compute gradient norm, i.e. $||\triangledown||$.

How to proceed after the step of finding the gradient norm? What is meant by clipping the gradient norm mathematically?

nbro
  • 39,006
  • 12
  • 98
  • 176
hanugm
  • 3,571
  • 3
  • 18
  • 50

1 Answers1

4

Gradient clipping is a technique that tackles exploding gradients. The idea of gradient clipping is very simple: If the gradient gets too large, we rescale it to keep it small. More precisely,

$$ \text{if } \Vert \mathbf{g} \Vert \geq c, \text{then } \mathbf{g} \leftarrow c \frac{\mathbf{g}}{\Vert \mathbf{g} \Vert} $$

where $c$ is a hyperparameter, $\mathbf{g}$ is the gradient, and $\Vert \mathbf{g} \Vert$ is the norm of $\mathbf{g}$.

Since $\frac{\mathbf{g}}{\Vert \mathbf{g} \Vert}$ is a unit vector, after rescaling the new $\mathbf{g}$ will have norm $c$.

Note that if $\frac{\mathbf{g}}{\Vert \mathbf{g} \Vert} < c$ , then we don’t need to do anything.

Check this article for more information

Archana David
  • 277
  • 2
  • 9
  • 6
    This appears to be copy-pasted from [another source](https://towardsdatascience.com/what-is-gradient-clipping-b8e815cdfb48), without appropriate attribution. We have certain rules that we require you to follow whenever copying from another source: https://ai.stackexchange.com/help/referencing. Please [edit] your answer to follow them. – D.W. Dec 27 '21 at 02:16