Questions tagged [double-dqn]

For questions about the double DQN model introduced in the paper "Deep Reinforcement Learning with Double Q-learning" (2015) by Hado van Hasselt et al.

22 questions
11
votes
1 answer

What exactly is the advantage of double DQN over DQN?

I started looking into the double DQN (DDQN). Apparently, the difference between DDQN and DQN is that in DDQN we use the main value network for action selection and the target network for outputting the Q values. However, I don't understand why…
Chukwudi
  • 349
  • 2
  • 7
8
votes
2 answers

Can DQN perform better than Double DQN?

I'm training both DQN and double DQN in the same environment, but DQN performs significantly better than double DQN. As I've seen in the double DQN paper, double DQN should perform better than DQN. Am I doing something wrong or is it possible?
Angelo
  • 201
  • 2
  • 16
5
votes
1 answer

Why does regular Q-learning (and DQN) overestimate the Q values?

The motivation for the introduction of double DQN (and double Q-learning) is that the regular Q-learning (or DQN) can overestimate the Q value, but is there a brief explanation as to why it is overestimated?
4
votes
1 answer

How can I ensure convergence of DDQN, if the true Q-values for different actions in the same state are very close?

I am applying a Double DQN algorithm to a highly stochastic environment where some of the actions in the agent's action space have very similar "true" Q-values (i.e. the expected future reward from either of these actions in the current state is…
3
votes
1 answer

Finding the true Q-values in gymnaiusm

I'm very interested in the true Q-values of state-action pairs in the classic control environments in gymnasium. Contrary to the usual goal, the ordering of the Q-values itself is irrelevant; a very close to accurate estimation of the Q-values is…
Mark B
  • 33
  • 3
3
votes
1 answer

Why do we minimise the loss between the target Q values and 'local' Q values?

I have a question regarding the loss function of target networks and current (online) networks. I understand the action value function. What I am unsure about is why we seek to minimise the loss between the qVal for the next state in our target…
user9317212
  • 161
  • 2
  • 10
3
votes
1 answer

How to compute the target for double Q-learning update step?

I've already read the original paper about double DQN but I do not find a clear and practical explanation of how the target $y$ is computed, so here's how I interpreted the method (let's say I have 3 possible actions (1,2,3)): For each experience…
2
votes
0 answers

Update Rule with Deep Q-Learning (DQN) for 2-player games

I am wondering how to correctly implement the DQN algorithm for two-player games such as Tic Tac Toe and Connect 4. While my algorithm is mastering Tic Tac Toe relatively quickly, I cannot get great results for Connect 4. The agent is learning to…
2
votes
0 answers

Can DQN outperform DoubleDQN?

I found a similar post about this issue, but unfortunately I did not find a proper answer. Are there any references where DQN is better than DoubleDQN, that is DoubleDQN does not improve DQN ?
2
votes
1 answer

How does the target network in double DQNs find the maximum Q value for each action?

I understand the fact that the neural network is used to take the states as inputs and it outputs the Q-value for state-action pairs. However, in order to compute this and update its weights, we need to calculate the maximum Q-value for the next…
1
vote
0 answers

DDQN Snake keeps crashing into the wall

Edit: I managed to fix this by changing the optimizer to SGD. I am very new to reinforcement learning, and I attempted to create a DDQN for the game snake but for some reason it keeps learning to crash into the wall. I've tried changing the…
1
vote
0 answers

What Kind of Reinforcement Learning Algorithms Can Be Used when the Action Space is Unfeasibly Large?

I know Deep Q network as a $S\times A$ DNN which maps the $S$ dimensional statespace to q-values of $A$ distinct actions. In my problem, the action space is still discrete, and finite, but depending on some parameters (e.g. number of users in a…
Della
  • 111
  • 2
1
vote
1 answer

How Come My (D)DQN Fails To Learn?

I am currently trying to teach a (D)DQN algorithm to play a 10x10 GridWorld game, so I can compare the two as I increase the number of moves the agent can take. The rewards are as follows: A step = -1 key = 100 Door = 100 Death wall = -100 See the…
1
vote
0 answers

DDQN Agent in Othello (Reversi) game struggle to learn

This is my first question on this forum and I would like to welcome everyone. I am trying to implement DDQN Agent playing Othello (Reversi) game. I have tried multiple things but the agent which seems to be properly initialized does not learn…
1
2