For questions related to the (tabular) version of the double Q-learning algorithm, which was introduced in "Double Q-learning" (2010, NeurIPS) by Hado Hasselt.
Questions tagged [double-q-learning]
6 questions
7
votes
1 answer
Deep Q-Learning "catastrophic drop" reasons?
I am implementing some "classical" papers in Model Free RL like DQN, Double DQN, and Double DQN with Prioritized Replay.
Through the various models im running on CartPole-v1 using the same underlying NN, I am noticing all of the above 3 exhibit a…

Virus
- 71
- 1
- 5
5
votes
1 answer
Why does regular Q-learning (and DQN) overestimate the Q values?
The motivation for the introduction of double DQN (and double Q-learning) is that the regular Q-learning (or DQN) can overestimate the Q value, but is there a brief explanation as to why it is overestimated?

ground clown
- 111
- 2
2
votes
1 answer
How to embed game grid state with walls as an input to neural network
I've read most of the posts on here regarding this subject, however most of them deal with gameboards where there are two different categories of single pieces on a board without walls etc.
My game board has walls, and multiple instances of food.…

Arlo Rostirolla
- 31
- 1
1
vote
1 answer
Q learning achieves small reward in simple dice game
I am trying to train a Q learning agent on the following game: The states are parametrised by an integer $S \geq 0$ (representing the sum of the previous die rolls). In each step the player can choose to roll a die or quit the game. Whenever the…

deepfloe
- 111
- 2
0
votes
1 answer
Does "number of actions" refer to the number of actions taken or size of the action space?
In the original DDQN article (https://arxiv.org/pdf/1509.06461.pdf,) the phrase "number of actions" is used twice;
First, in the following context:
Secondly in Theorem 1.
I have a hard time understanding the way the phrase is being used or if it…

GeorgeWTrump
- 37
- 5
0
votes
0 answers
Is there any toy example that can exemplify the performance of double Q-learning?
I recently tried to reproduce the results of double Q-learning. However, the results are not satisfying. I have also tried to compare double Q learning with Q-learning in Taxi-v3, FrozenLake without slippery, Roulette-v0, etc. But Q-learning…

Allen_FrCh
- 1
- 1