2

In this blog post: http://www.argmin.net/2016/04/18/bottoming-out/

Prof Recht shows two plots:

enter image description here

enter image description here

He says one of the reasons the plot below has a lower train-test gap is because that model was trained with a lower learning rate (and he also manually drops the learning rate at 120 epoch).

Why would a lower learning rate reduce overfitting?

nbro
  • 39,006
  • 12
  • 98
  • 176
user3180
  • 598
  • 3
  • 14
  • bigger learning rate makes the loss curve oscillation high, and it works for training data only – Dee Jul 14 '20 at 09:59
  • 1
    BTW, two represented figures have not the same y-scale and it can be confusing and deceptive! – OmG Jul 14 '20 at 12:39

0 Answers0