It seems that older RNNs have a limitation for their use cases and have been outperformed by other recurrent architectures, such as the LSTM and GRU.
2 Answers
These newer RNNs (LSTMs and GRUs) have greater memory control, allowing previous values to persist or to be reset as necessary for many sequences of steps, avoiding "gradient decay" or eventual degradation of the values passed from step to step. LSTM and GRU networks make this memory control possible with memory blocks and structures called "gates" that pass or reset values as appropriate.

- 39,006
- 12
- 98
- 176

- 320
- 5
- 10
LSTMs or GRUs are computationally more effective than the standard RNNs because they explicitly attempt to address the vanishing and exploding gradient problems, which are numerical problems related to the vanishing or explosion of the values of the gradient vector (the vector that contains the partial derivatives of the loss function with respect to the parameters of the model) that arise when training recurrent neural networks with gradient descent and back-propagation through time.

- 39,006
- 12
- 98
- 176