What is the effect of gradient clipping by norm on the performance of a model?

Question

It is recommended to apply gradient clipping by normalization in case of exploding gradients. The following quote is taken from here answer

One way to assure it is exploding gradients is if the loss is unstable and not improving, or if loss shows NaN value during training.

Apart from the usual gradient clipping and weights regularization that are recommended...

But I want to know the effect of gradient clipping by normalization in the performance of the model in normal or general cases.

Suppose I have a model and I run up to 800 epochs without gradient clipping because of the reason that there are no exploding gradients. If I run the same model with gradient clipping by norm, even if it is not necessary, then does the performance of the model decline?

[Why gradient clipping accelerates training: A theoretical justification for adaptivity](https://arxiv.org/abs/1905.11881) They use gradient clipping not just for RNNs, but also image recognition. — serali, Oct 13 '21 at 10:24
I think it's a good idea to tag your posts with more general tags, so that the context is immediately clear. For instance, in this case, gradient clipping is technique that is used for training neural networks with gradient descent, so, as I did, you could have added the tags that you see now. Of course, this is not a requirement and it's just my personal suggestion, but the tags can also clarify the context (as I said): if used only one tag and that related topic could occur in 2 different general contexts, that could make the question unclear. — nbro, Oct 13 '21 at 11:46

What is the effect of gradient clipping by norm on the performance of a model?

0 Answers0