I'm watching the video Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorflow Tutorial | Edureka where the author says that the LSTM and GRU architecture help to reduce the vanishing gradient problem. How do LSTM and GRU prevent the vanishing gradient problem?
Asked
Active
Viewed 256 times
2
2 Answers
0
LSTMs solve the problem using a unique additive gradient structure that includes direct access to the forget gate's activations, enabling the network to encourage desired behaviour from the error gradient using frequent gates update on every time step of the learning process.

Saurav Maheshkar
- 756
- 1
- 7
- 20
0
LSTM passes the previous state's hidden weights to the current state. This simple yet effective solution helps them in minimizing the Vanishing gradient, because all states now have some information about all of the previous states. Consider like you are trading and you have all the numbers from a year ago, which surely helps in making better decisions!
I highly recommend this article, which explains the concept very well.

Minh-Long Luu
- 1,120
- 2
- 20
-
Can you clarify which parts of that article are useful to understanding how LSTMs avoid the vanishing gradient problem? I've searched it for "vanish", "gradient" and "derivative" and found nothing. – Sycorax Mar 18 '23 at 18:33
-
"long-term dependencies" is the keyword. It is the same as "vanishing gradient". The dependencies require the network to backprop multiple times, thus "vanishing" appears. – Minh-Long Luu Mar 19 '23 at 01:27
-
Perhaps you could [edit] your answer to explain in detail how the LSTM mechanism overcomes the vanishing gradient problem. As it stands, the connection between the two is not entirely clear. – Sycorax Mar 30 '23 at 14:58