Why does training converges when the norm of gradient increases?

Asked Mar 29 '23 at 10:15

Active Mar 29 '23 at 10:15

Viewed 20 times

This is from deep learning book by Ian Goodfellow and Yoshua Bengio and Aaron Courville. When training converges well, I thought the gradient should be at local minima. But the book says it often does not arrive at the critical points. Could you explain me why and how it can occur?

Thank you

asked Mar 29 '23 at 10:15

tesio

Why does training converges when the norm of gradient increases?

0 Answers0