Is there any toy example that can exemplify the performance of double Q-learning?

Asked Dec 23 '20 at 07:56

Active Dec 27 '20 at 10:18

Viewed 133 times

I recently tried to reproduce the results of double Q-learning. However, the results are not satisfying. I have also tried to compare double Q learning with Q-learning in Taxi-v3, FrozenLake without slippery, Roulette-v0, etc. But Q-learning outperforms double Q-learning in all of these environments.

I am not sure whether if there is something wrong with my implementation as many materials about double Q actually focus on double DQN. While at the same time of checking, I wonder is there any toy example that can exemplify the performance of double Q-learning?

edited Dec 27 '20 at 10:18

David

4,591
1
6
25

asked Dec 23 '20 at 07:56

Allen_FrCh

1

Have you used the same parameters and hyper-parameters as the ones in [the paper](https://papers.nips.cc/paper/2010/file/091d584fced301b442654dd8c23b3fc9-Paper.pdf) for the roulette environment? – nbro Dec 23 '20 at 09:31
@nbro Thanks for your comment. I haven't tested that in the paper's roulette as I am not very understand the environment settings. Consider giving it a try another day. – Allen_FrCh Dec 23 '20 at 11:38
have you seen the example from the Sutton and Barto book? – David Dec 26 '20 at 23:05
@DavidIreland Haven't yet. Thanks for suggestion. – Allen_FrCh Dec 27 '20 at 07:11

Is there any toy example that can exemplify the performance of double Q-learning?

0 Answers0