The norm is a mathematical operation that can be applied to vectors, or matrices, informally measuring the "length" of such mathematical objects.
Since the gradient $g=\nabla_\theta f(x;\theta)$ of some continuous function $f:\mathbb R^N\to \mathbb R^M$ w.r.t. some parameters $\theta$ can be either a scalar (if $N=M=1$), vector (if $N>1, M=1$), or matrix (if $N>1,M>1$) the "gradient norm" is just the norm operation applied to $g$.
- If the gradient is a scalar, the norm is just its absolute value: $|g|$.
- For the vector case, the norm measures the length or magnitude of the vector. There are various notions, resulting in different norms. The most used (especially in the context of gradients) is the Euclidean norm (also called $l_2$-norm): $\|g\|_2$.
- For the matrices the concept of vector norm is extended. For example, the Frobenius norm, $\|g\|_F$, is the equivalent of the Euclidean norm but for matrices.
Now in the context of DL, the gradient is usually a list of matrices and vectors. So when referring to "gradient norm" one usually means the global $l_2$-norm of all the gradients, computed as follows (see tf.linalg.global_norm):
$$
\|G\|_2 = \sqrt\sum_{g\in G}\|g\|_2^2,
$$
which is the square root of the sum of the squared euclidean norms, for each gradient $g$ in the list $G$ of gradients.
In DL, the norm of the gradients can serve two main purposes: 1) to monitor the gradients to see if vanishing/exploding grads occurs during training, and 2) to be applied with optimization algorithms in the context of gradient clipping, thus scaling the norm of the gradients: this can be done individually on each $g$, and globally on $G$ (gradient list).