2

In reinforcement learning, there are model-based versus model-free methods. Within model-based ones, there are policy-based and value-based methods.

AlphaGo Deepmind RL model has beaten the best Go human player. What kind of reinforcement model does it use? Why is this particular model appropriate for Go game?

user781486
  • 201
  • 1
  • 5
  • I think their program is not just based on a single algorithm, and they used several algorithms in their design. However they have used deep reinforcement learning (There are several methods available, see for instance: "Human-level control through deep reinforcement learning"). The main difference between deep reinforcement learning and traditional Q- learning is that, tradition Q learning can only learn applicably when we have few states and actions say 10 or 20. – Reza_va Dec 24 '20 at 01:25

0 Answers0