Why do momentum techniques not work well for RNNs?

Asked Mar 09 '20 at 05:31

Active Dec 13 '20 at 12:37

Viewed 72 times

AFAIK, momentum is quite useful when training CNNs, and can speed-up the training substantially without any drop in validation accuracy.

I've recently learned that it is not as helpful for RNNs, where plain SGD is preferred.

For example, Deep Learning by Goodfellow et. al says (section 10.11, page 401):

Both of these approaches have largely been replaced by simply using SGD (even without momentum) applied to LSTMs.

The author talks about LSTMs and "both of these approaches" refer to second-order and first-order SGD methods with momentum methods, respectively, according to my understanding.

What causes this discrepancy?

edited Dec 13 '20 at 12:37

nbro

39,006
12
98
176

asked Mar 09 '20 at 05:31

SpiderRico

The momentum does not always help on CNN either – mirror2image Mar 09 '20 at 05:52
@mirror2image Sure, but I suppose it's less common to use it for RNNs than CNNs. – SpiderRico Mar 09 '20 at 05:55

Why do momentum techniques not work well for RNNs?

0 Answers0