What kind of reinforcement learning method does AlphaGo Deepmind use to beat the best human Go player?

Asked Dec 23 '20 at 06:14

Active Dec 24 '20 at 01:25

Viewed 94 times

In reinforcement learning, there are model-based versus model-free methods. Within model-based ones, there are policy-based and value-based methods.

AlphaGo Deepmind RL model has beaten the best Go human player. What kind of reinforcement model does it use? Why is this particular model appropriate for Go game?

asked Dec 23 '20 at 06:14

user781486

I think their program is not just based on a single algorithm, and they used several algorithms in their design. However they have used deep reinforcement learning (There are several methods available, see for instance: "Human-level control through deep reinforcement learning"). The main difference between deep reinforcement learning and traditional Q- learning is that, tradition Q learning can only learn applicably when we have few states and actions say 10 or 20. – Reza_va Dec 24 '20 at 01:25

What kind of reinforcement learning method does AlphaGo Deepmind use to beat the best human Go player?

0 Answers0