How can deep Q-learning converge if the targets may not be correct?

Asked Aug 02 '20 at 17:18

Active Aug 02 '20 at 21:03

Viewed 181 times

In deep Q-learning, $Q(s, a)$ and $Q'(s, a)$ are predicted or estimated by the neural network itself. In supervised learning, the target value is a true unbiased value. However, this isn't the case in reinforcement learning. So, how can we be sure that deep Q-learning converges? How do we know that the target Q values are accurate?

edited Aug 02 '20 at 21:03

nbro

39,006
12
98
176

asked Aug 02 '20 at 17:18

Chukwudi

See https://ai.stackexchange.com/q/21053/2444 and https://ai.stackexchange.com/q/11679/2444. – nbro Aug 03 '20 at 14:45

How can deep Q-learning converge if the targets may not be correct?

0 Answers0