Do learning rate schedulers conflict with or prevent convergence of the Adam optimiser?

Asked Mar 31 '22 at 10:58

Active Apr 03 '22 at 10:05

Viewed 359 times

Because Adam manages learning rates internally, it's incompatible with most learning rate schedulers. Anything more complicated than simple learning warmup and/or decay will put the Adam optimizer to "complete" with the learning rate scheduler when managing its internal LR, causing model convergence to worsen.

I have found the same convergence issues in my own work when using both Adam and a StepLR scheduler.

I understand that Adam adjusts the learning rate on a per-parameter basis, which perhaps negates the need for a learning rate scheduler, but why does this lead to convergence issues?

Is there any mathematical reason/proof why using both the Adam optimiser and a learning rate scheduler causes convergence issues?

Is it true that they really "compete" with each other?

edited Apr 03 '22 at 10:05

asked Mar 31 '22 at 10:58

Jack G

Do learning rate schedulers conflict with or prevent convergence of the Adam optimiser?

0 Answers0