0

I recently tried to reproduce the results of double Q-learning. However, the results are not satisfying. I have also tried to compare double Q learning with Q-learning in Taxi-v3, FrozenLake without slippery, Roulette-v0, etc. But Q-learning outperforms double Q-learning in all of these environments.

I am not sure whether if there is something wrong with my implementation as many materials about double Q actually focus on double DQN. While at the same time of checking, I wonder is there any toy example that can exemplify the performance of double Q-learning?

David
  • 4,591
  • 1
  • 6
  • 25
  • 1
    Have you used the same parameters and hyper-parameters as the ones in [the paper](https://papers.nips.cc/paper/2010/file/091d584fced301b442654dd8c23b3fc9-Paper.pdf) for the roulette environment? – nbro Dec 23 '20 at 09:31
  • @nbro Thanks for your comment. I haven't tested that in the paper's roulette as I am not very understand the environment settings. Consider giving it a try another day. – Allen_FrCh Dec 23 '20 at 11:38
  • have you seen the example from the Sutton and Barto book? – David Dec 26 '20 at 23:05
  • @DavidIreland Haven't yet. Thanks for suggestion. – Allen_FrCh Dec 27 '20 at 07:11

0 Answers0