6

In the vanilla Monte Carlo tree search (MCTS) implementation, the rollout is usually implemented following a uniform random policy, that is, it takes random actions until the game is finished and only then the information gathered is backed up.

I have read the AlphaZero paper (and the AlphaGo Zero too) and I didn't find any information on how the rollout is implemented (maybe I missed it).

How is the rollout from the MCTS implemented in both the AlphaGo Zero and the AlphaZero algorithms?

nbro
  • 39,006
  • 12
  • 98
  • 176
ihavenoidea
  • 255
  • 2
  • 11
  • 1
    AlphaZero/Go don't have rollout. Rollout replaced with neural network value estimation – mirror2image Nov 03 '19 at 07:40
  • @mi: I thought that original Alpha Go *did* have a rollout policy, but that one of the simplifying changes in AlphaZero was to completely remove it? – Neil Slater Nov 03 '19 at 09:03
  • 1
    @NeilSlater That's correct. The question seems to be about AlphaGo Zero and AlphaZero though, not about Alpha Go :) **AlphaGo**: uses a fast rollout policy (trained like the policy network, but is not a large DNN, just a single layer + softmax I believe). **AlphaGo Zero**: no rollouts. **AlphaZero**: no rollouts. – Dennis Soemers Nov 03 '19 at 13:53
  • @mirror2image even though it was only a line, you answered the question -- why not make it slightly more verbose and make it an answer – mshlis Nov 03 '19 at 17:57
  • Thanks for the answers. However, I'm still confused. I understand that they use the V value returned from the network to simulate who won, but how do they know which value of V a player won or lost? Because as far as I understand, in the weight update (in the loss function), they will compare the probability distribution gathered from the self-plays and the difference between the real winner (from MCTS) and the value V. – ihavenoidea Nov 03 '19 at 23:18
  • In short, I'm confused about the V value range returned from the network. Will it be in the [-1,1] range? – ihavenoidea Nov 03 '19 at 23:19
  • @ihavenoidea I got the same question have you figured it out? – Daniel Jul 29 '21 at 19:16

0 Answers0