0

I am training an A3C with stacked LSTM.

During initial training, my model was giving descent +ve reward. However, after many episodes, its reward just goes to zero and is continuing for a long time. Is it because of LSTM?

Is it normal?

Should I expect it to work after the training is over or just terminate the training and increase the density of my network?

nbro
  • 39,006
  • 12
  • 98
  • 176
user2783767
  • 121
  • 2

1 Answers1

0

The thing you're explaining is not impossible for a RL model, but it's rare. That's a known thing that some RL algorithms work or don't work depending on a random seed. I implemented the same model once to play KungFuMaster-v0. It was during a university RL course and the code seemed fine (actually 2 people including teacher looked at it very carefully and they didn't find any bugs). I remember calling it 10 times a row and one time out of ten it showed that nasty behavior. There were teacher's notes in the task: if the reward suddenly drops to 0 and stay there for a long time, check your code, there's a high probability of a bug. So I'd say if your net works just fine 9 times out of 10, probably there are no bugs, otherwise if I were you, I'd carefully check the code.

Michał Słodki
  • 239
  • 1
  • 4
  • 8