0

Would anybody share the experience on how to train a hierarchical DQN to play the Montezuma's Revenge game? How should I design the reward function? How should I balance the anneal rate of the two-level?

I've been trying to train an agent to solve this game. The agent is with 6 lives. So, every time when the agent fetches the key and loses his life for the instability or power of the sub-goal network, the agent restart at the original location and simply go through the door, thus gets a huge reward. With an $\epsilon$-greedy rate 0.1, the agent is possible to choose the subgoal key network and fetch the key, so the agent always chooses the door as the subgoal.

Would anyone show me how to train this agent in the setting of one life?

nbro
  • 39,006
  • 12
  • 98
  • 176
zhma
  • 1
  • 1
  • When you say HDQN, are you referring to any particular hierarchical DQN algorithm? If yes, please, provide a link to the paper. – nbro Aug 25 '20 at 10:28
  • Here is the link[link](https://arxiv.org/abs/1604.06057). In my implementation, every sub goal is considered to be a neural network. The architecture is the smame as that of the paper. – zhma Aug 27 '20 at 06:17

0 Answers0