5

I already know deep RL, but to learn it deeply I want to know why do we need 2 networks in deep RL. What does the target network do? I now there is huge mathematics into this, but I want to know deep Q-learning deeply, because I am about to make some changes in the deep Q-learning algorithm (i.e. invent a new one). Can you help me to understand what happens during executing a deep Q-learning algorithm intuitively?

nbro
  • 39,006
  • 12
  • 98
  • 176
dato nefaridze
  • 862
  • 6
  • 20
  • 2
    I think also this question goes into the exact same direction: [Why does adding another network help in double DQN?](https://ai.stackexchange.com/questions/22443/why-does-adding-another-network-help-in-double-dqn). – Daniel B. Jul 15 '20 at 20:54
  • 1
    By the way: Here is a video that also explains it nicely as well: https://youtu.be/xVkPh9E9GfE?t=171 I can only recommend that online lecture series. – Daniel B. Jul 15 '20 at 21:01

1 Answers1

6

In DQN that was presented in the original paper the update target for the Q-Network is $\left(r_t + \max_aQ(s_{t+1},a;\theta^-) - Q(s_t,a_t; \theta)\right)^2$ were $\theta^-$ is some old version of the parameters that gets updated every $C$ updates, and the Q-Network with these parameters is the target network.

If you didn't use this target network, i.e. if your update target was $\left(r_t + \max_aQ(s_{t+1},a;\theta) - Q(s_t,a_t; \theta)\right)^2$, then learning would become unstable because the target, $r_t + \max_aQ(s_{t+1},a;\theta)$, and the prediction, $Q(s_t,a_t; \theta)$, are not independent, as they both rely on $\theta$.

A nice analogy I saw once was that it is akin to a dog chasing it's own tail - it will never catch it because the target is non-stationary; this non-stationarity is exactly what the dependence between the target and the prediction cause.

nbro
  • 39,006
  • 12
  • 98
  • 176
David
  • 4,591
  • 1
  • 6
  • 25
  • 1
    Given that this is a duplicate but your answer is valuable, I think it would have been better to write the answer under https://ai.stackexchange.com/q/6982/2444. – nbro Jul 16 '20 at 03:16