Why would the reward of A3C with LSTM suddenly drop off after many episodes?

Question

I am training an A3C with stacked LSTM.

During initial training, my model was giving descent +ve reward. However, after many episodes, its reward just goes to zero and is continuing for a long time. Is it because of LSTM?

Is it normal?

Should I expect it to work after the training is over or just terminate the training and increase the density of my network?

score 0 · Answer 1 · answered Sep 13 '20 at 01:03

The thing you're explaining is not impossible for a RL model, but it's rare. That's a known thing that some RL algorithms work or don't work depending on a random seed. I implemented the same model once to play KungFuMaster-v0. It was during a university RL course and the code seemed fine (actually 2 people including teacher looked at it very carefully and they didn't find any bugs). I remember calling it 10 times a row and one time out of ten it showed that nasty behavior. There were teacher's notes in the task: if the reward suddenly drops to 0 and stay there for a long time, check your code, there's a high probability of a bug. So I'd say if your net works just fine 9 times out of 10, probably there are no bugs, otherwise if I were you, I'd carefully check the code.

Why would the reward of A3C with LSTM suddenly drop off after many episodes?

1 Answers1