2

In deep Q-learning, $Q(s, a)$ and $Q'(s, a)$ are predicted or estimated by the neural network itself. In supervised learning, the target value is a true unbiased value. However, this isn't the case in reinforcement learning. So, how can we be sure that deep Q-learning converges? How do we know that the target Q values are accurate?

nbro
  • 39,006
  • 12
  • 98
  • 176
Chukwudi
  • 349
  • 2
  • 7
  • See https://ai.stackexchange.com/q/21053/2444 and https://ai.stackexchange.com/q/11679/2444. – nbro Aug 03 '20 at 14:45

0 Answers0