Questions tagged [alphago-zero]

For questions related to AlphaGo Zero, which is a version of DeepMind's Go software, AlphaGo, that does not use data from human games and it is stronger than AlphaGo. There is a generalized version of AlphaGo Zero called AlphaZero, which beat the 3-day version of AlphaGo Zero by winning 60 games to 40. AlphaGo Zero was introduced in the paper "Mastering the game of Go without human knowledge" (2017) by David Silver et al.

Have a look at the research paper that introduced AlphaGo Zero Mastering the game of Go without human knowledge (2017) by David Silver et al. and https://en.wikipedia.org/wiki/AlphaGo_Zero.

30 questions

votes

1 answer

Why does the policy network in AlphaZero work?

In AlphaZero, the policy network (or head of the network) maps game states to a distribution of the likelihood of taking each action. This distribution covers all possible actions from that state. How is such a network possible? The possible actions…

asked Sep 14 '18 at 20:21

chessprogrammer

2,215
2
12
23

votes

1 answer

Why is the merged neural network of AlphaGo Zero more efficient than two separate neural networks?

AlphaGo Zero contains several improvements compared to its predecessors. Architectural details of Alpha Go Zero can be seen in this cheat sheet. One of those improvements is using a single neural network that calculates move probabilities and the…

neural-networks comparison architecture alphago-zero efficiency

asked Oct 30 '17 at 23:01

Demento

1,684
1
7
26

votes

3 answers

Would AlphaGo Zero become perfect with enough training time?

Would AlphaGo Zero become theoretically perfect with enough training time? If not, what would be the limiting factor? (By perfect, I mean it always wins the game if possible, even against another perfect opponent.)

neural-networks monte-carlo-tree-search alphago alphazero alphago-zero

asked Sep 10 '18 at 22:31

PyRulez

votes

0 answers

How is the rollout from the MCTS implemented in both of the AlphaGo Zero and the AlphaZero algorithms?

In the vanilla Monte Carlo tree search (MCTS) implementation, the rollout is usually implemented following a uniform random policy, that is, it takes random actions until the game is finished and only then the information gathered is backed up. I…

monte-carlo-tree-search alphazero implementation alphago-zero

asked Nov 03 '19 at 00:40

ihavenoidea

votes

2 answers

What part of the game is the value network trained to predict a winner on?

The Alpha Zero (as well as AlphaGo Zero) papers say they trained the value head of the network by "minimizing the error between the predicted winner and the game winner" throughout its many self-play games. As far as I could tell, further…

machine-learning reinforcement-learning alphago alphazero alphago-zero

asked Sep 13 '18 at 03:37

chessprogrammer

2,215
2
12
23

votes

2 answers

What is the difference between DQN and AlphaGo Zero?

I have already implemented a relatively simple DQN on Pacman. Now I would like to clearly understand the difference between a DQN and the techniques used by AlphaGo zero/AlphaZero and I couldn't find a place where the features of both approaches are…

reinforcement-learning dqn alphazero deep-rl alphago-zero

asked Feb 27 '19 at 06:17

FenryrMKIII

votes

1 answer

What is a "logit probability"?

DeepMind's paper "Mastering the game of Go without human knowledge" states in its "Methods" section on its "Neural network architecture" that the output layer of AlphaGo Zero's policy head is "A fully connected linear layer that outputs a vector of…

neural-networks terminology activation-functions alphago-zero

asked Jan 23 '19 at 15:33

sadakatsu

votes

1 answer

Would it take 1700 years to run AlphaGo Zero in commodity hardware?

From this link, AlphaGo would take millennia to run in regular hardware. They generated 29 million games for the final result, which means it's going to take me about 1700 years to replicate this. Are these calculations correct?

deep-rl alphago-zero computational-complexity alphago

asked Nov 30 '17 at 11:44

BlueMoon93

votes

1 answer

How Does AlphaGo Zero Implement Reinforcement Learning?

AlphaGo Zero (https://deepmind.com/blog/alphago-zero-learning-scratch/) has several key components that contribute to it's success: A Monte Carlo Tree Search Algorithm that allows it to better search and learn from the state space of Go A Deep…

reinforcement-learning monte-carlo-tree-search supervised-learning alphago-zero go

asked Jun 07 '19 at 16:56

SeeDerekEngineer

votes

1 answer

How does policy network learn in AlphaZero?

I'm currently trying to understand how AlphaZero works. There is one thing with the training of the AlphaZero's policy head that confuses me. Basically, in AlphaGo Zero's paper (where the major part of AlphaZero algorithm is explained) a combined…

reinforcement-learning alphazero alphago-zero deepmind

asked May 25 '21 at 09:31

Alberto M

votes

1 answer

AlphaGo Zero: does $Q(s_t, a)$ dominate $U(s_t, a)$ in difficult game states?

AlphaGo Zero AlphaGo Zero uses a Monte-Carlo Tree Search where the selection phase is governed by $\operatorname*{argmax}\limits_a\left( Q(s_t, a) + U(s_t, a) \right)$, where: the exploitation parameter is $Q(s_t, a) = \displaystyle…

reinforcement-learning monte-carlo-tree-search alphazero alphago-zero alphago

asked Dec 03 '20 at 03:14

user3667125

1,500
5
13

votes

1 answer

Why does AlphaGo Zero select move based on exponentiated visit count?

From the AlphaGo Zero paper, AlphaGo Zero uses an exponentiated visit count from the tree search. Why use visit count instead of the mean action value $Q(s, a)$?

reinforcement-learning alphago-zero

asked Jun 05 '20 at 22:28

Cash Lo

votes

2 answers

How does the AlphaGo Zero policy decide what move to execute?

I was going through the AlphaGo Zero paper and I was trying to understand everything, but I just can't figure out this one formula: $$ \pi(a \mid s_0) = \frac{N(s_0, a)^{\frac{1}{\tau}}}{\sum_b N(s_0, b)^{\frac{1}{\tau}}} $$ Could someone decode how…

reinforcement-learning policies deepmind alphago-zero alphago

asked May 07 '20 at 11:09

Eloi M.

votes

1 answer

What is the input to AlphaGo's neural network?

I have been reading an article on AlphaGo and one sentence confused me a little bit, because I'm not sure what it exactly means. The article says: AlphaGo Zero only uses the black and white stones from the Go board as its input, whereas previous…

neural-networks deep-learning reinforcement-learning architecture alphago-zero

asked Apr 22 '19 at 07:05

user24093

votes

1 answer

Why is Monte Carlo used as the tree search algorithm for AlphaGo?

Could a better algorithm other than Monte Carlo be used for the AlphaGo computer? Why didn't the DeepMind team think of choosing another kind of algorithm rather than spending time on their neural nets?

monte-carlo-tree-search alphago monte-carlo-methods alphago-zero

asked Apr 09 '19 at 17:11

Jay Critch

2 Next