I recently tried to reproduce the results of double Q-learning. However, the results are not satisfying. I have also tried to compare double Q learning with Q-learning in Taxi-v3, FrozenLake without slippery, Roulette-v0, etc. But Q-learning outperforms double Q-learning in all of these environments.
I am not sure whether if there is something wrong with my implementation as many materials about double Q actually focus on double DQN. While at the same time of checking, I wonder is there any toy example that can exemplify the performance of double Q-learning?