0

There are two sources that I'm using to to try and understand why LSTMs reduce the likelihood of the vanishing gradient problem associated with RNNs.

Both of these sources mention the reason LSTMs are able to reduce the likelihood of the vanishing gradient problem is because

  1. The gradient contains the forget gate's vector of activions
  2. The addition of four gradient values help balance gradient values

I understand (1), but I don't understand what (2) means.

Any insight would greatly be appreciated!

THAT_AI_GUY
  • 177
  • 9

0 Answers0