Why are GRU and LSTM better than standard RNNs?

Question

It seems that older RNNs have a limitation for their use cases and have been outperformed by other recurrent architectures, such as the LSTM and GRU.

score 4 · Accepted Answer · edited Jun 15 '20 at 21:00

4

These newer RNNs (LSTMs and GRUs) have greater memory control, allowing previous values to persist or to be reset as necessary for many sequences of steps, avoiding "gradient decay" or eventual degradation of the values passed from step to step. LSTM and GRU networks make this memory control possible with memory blocks and structures called "gates" that pass or reset values as appropriate.

edited Jun 15 '20 at 21:00

nbro

39,006
12
98
176

answered Jun 14 '18 at 09:15

Simbarashe Timothy Motsi

320
5
10

nbro · Answer 2 · 2019-11-02T23:34:58.110

LSTMs or GRUs are computationally more effective than the standard RNNs because they explicitly attempt to address the vanishing and exploding gradient problems, which are numerical problems related to the vanishing or explosion of the values of the gradient vector (the vector that contains the partial derivatives of the loss function with respect to the parameters of the model) that arise when training recurrent neural networks with gradient descent and back-propagation through time.

Why are GRU and LSTM better than standard RNNs?

2 Answers2