Questions tagged [gradient-clipping]
3 questions
4
votes
1 answer
Combine multiple losses with gradient descent
I am optimizing a neural network with Adam using 3 different losses. Their scale is very different, and the current method is to either sum the losses and clip the gradient or to manually weight them within the sum. Something like:…

Simon
- 153
- 3
3
votes
1 answer
What exactly happens in gradient clipping by norm?
Consider the following description regarding gradient clipping in PyTorch
torch.nn.utils.clip_grad_norm_(parameters, max_norm, norm_type=2.0, error_if_nonfinite=False)
Clips gradient norm of an iterable of parameters.
The norm is computed over all…

hanugm
- 3,571
- 3
- 18
- 50
1
vote
0 answers
What is the effect of gradient clipping by norm on the performance of a model?
It is recommended to apply gradient clipping by normalization in case of exploding gradients. The following quote is taken from here answer
One way to assure it is exploding gradients is if the loss is unstable
and not improving, or if loss shows…

hanugm
- 3,571
- 3
- 18
- 50