Why does learning rate reduce train-test generalization gap?

Asked Jul 14 '20 at 07:21

Active Jul 14 '20 at 11:59

Viewed 206 times

In this blog post: http://www.argmin.net/2016/04/18/bottoming-out/

Prof Recht shows two plots:

He says one of the reasons the plot below has a lower train-test gap is because that model was trained with a lower learning rate (and he also manually drops the learning rate at 120 epoch).

Why would a lower learning rate reduce overfitting?

edited Jul 14 '20 at 11:59

nbro

39,006
12
98
176

asked Jul 14 '20 at 07:21

user3180

bigger learning rate makes the loss curve oscillation high, and it works for training data only – Dee Jul 14 '20 at 09:59
1

BTW, two represented figures have not the same y-scale and it can be confusing and deceptive! – OmG Jul 14 '20 at 12:39

Why does learning rate reduce train-test generalization gap?

0 Answers0