Questions tagged [alphazero]

For questions related to DeepMind's AlphaZero, which is a computer program that can play Go, Chess, and Shogi. AlphaZero achieved, within 24 hours of training, a superhuman level of play in these three games by defeating world-champion programs Stockfish, Elmo, and the 3-day version of AlphaGo Zero. AlphaZero was introduced in "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" (2017) by David Silver et al.

Have a look at the research paper that introduced AlphaZero Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm (2017) by David Silver et al. and https://en.wikipedia.org/wiki/AlphaZero.

70 questions

votes

1 answer

Why does the policy network in AlphaZero work?

In AlphaZero, the policy network (or head of the network) maps game states to a distribution of the likelihood of taking each action. This distribution covers all possible actions from that state. How is such a network possible? The possible actions…

asked Sep 14 '18 at 20:21

chessprogrammer

2,215
2
12
23

votes

3 answers

Does Monte Carlo tree search qualify as machine learning?

To the best of my understanding, the Monte Carlo tree search (MCTS) algorithm is an alternative to minimax for searching a tree of nodes. It works by choosing a move (generally, the one with the highest chance of being the best), and then performing…

machine-learning reinforcement-learning game-ai monte-carlo-tree-search alphazero

asked Aug 16 '18 at 02:13

Inertial Ignorance

votes

3 answers

Why were Chess experts surprised by the AlphaZero's victory against Stockfish?

It was recently brought to my attention that Chess experts took the outcome of this now famous match as something of an upset. See: Chess’s New Best Player Is A Fearless, Swashbuckling Algorithm As as a non-expert on Chess and Chess AI, my…

chess alphazero

asked Feb 05 '18 at 23:00

DukeZhou

6,237
5
25
53

votes

1 answer

Is AlphaZero an example of an AGI?

From DeepMind's research paper on arxiv.org: In this paper, we apply a similar but fully generic algorithm, which we call AlphaZero, to the games of chess and shogi as well as Go, without any additional domain knowledge except the rules of the…

game-ai definitions agi alphazero alphago

asked Nov 26 '18 at 00:42

Siddhartha

votes

1 answer

Does AlphaZero use Q-Learning?

I was reading the AlphaZero paper Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, and it seems they don't mention Q-Learning anywhere. So does AZ use Q-Learning on the results of self-play or just a Supervised…

reinforcement-learning q-learning monte-carlo-tree-search supervised-learning alphazero

asked Jul 01 '19 at 17:02

Avetik

votes

2 answers

How can alpha zero learn if the tree search stops and restarts before finishing a game?

I am trying to understand how alpha zero works, but there is one point that I have problems understanding, even after reading several different explanations. As I understand it (see for example…

deep-learning reinforcement-learning monte-carlo-tree-search alphazero

asked Apr 12 '19 at 11:42

Jonathan Lindgren

votes

3 answers

Would AlphaGo Zero become perfect with enough training time?

Would AlphaGo Zero become theoretically perfect with enough training time? If not, what would be the limiting factor? (By perfect, I mean it always wins the game if possible, even against another perfect opponent.)

neural-networks monte-carlo-tree-search alphago alphazero alphago-zero

asked Sep 10 '18 at 22:31

PyRulez

votes

1 answer

How does the Alpha Zero's move encoding work?

I am a beginner in AI. I'm trying to train a multi-agent RL algorithm to play chess. One issue that I ran into was representing the action space (legal moves/or honestly just moves in general) numerically. I looked up how Alpha Zero represented it,…

reinforcement-learning alphazero chess multi-agent-systems action-spaces

asked Apr 14 '21 at 17:57

Akshay Ghosh

votes

0 answers

How is the rollout from the MCTS implemented in both of the AlphaGo Zero and the AlphaZero algorithms?

In the vanilla Monte Carlo tree search (MCTS) implementation, the rollout is usually implemented following a uniform random policy, that is, it takes random actions until the game is finished and only then the information gathered is backed up. I…

monte-carlo-tree-search alphazero implementation alphago-zero

asked Nov 03 '19 at 00:40

ihavenoidea

votes

2 answers

What part of the game is the value network trained to predict a winner on?

The Alpha Zero (as well as AlphaGo Zero) papers say they trained the value head of the network by "minimizing the error between the predicted winner and the game winner" throughout its many self-play games. As far as I could tell, further…

machine-learning reinforcement-learning alphago alphazero alphago-zero

asked Sep 13 '18 at 03:37

chessprogrammer

2,215
2
12
23

votes

1 answer

Clarifying representation of Neural Nerwork input for Chess Alpha Zero

In the Alpha Zero paper (https://arxiv.org/pdf/1712.01815.pdf) page 13, the input for the NN is described. In the beggining of the page, the authors state that: "The input to the Neural Network is an N x X x (MT + L) image stack [...]" From this, I…

reinforcement-learning deep-rl alphazero chess

asked Feb 17 '21 at 00:05

Andrew

votes

2 answers

Is it practical to train AlphaZero or MuZero (for indie games) on a personal computer?

Is it practical/affordable to train an AlphaZero/MuZero engine using a residential gaming PC, or would it take thousands of years of training for the AI to learn enough to challenge humans? I'm having trouble wrapping my head around how much…

training game-ai alphazero muzero

asked Feb 15 '21 at 15:20

Luke W

votes

1 answer

Do AlphaZero/MuZero learn faster in terms of number of games played than humans?

I don't know much about AI and am just curious. From what I read, AlphaZero/MuZero outperform any human chess player after a few hours of training. I have no idea how many chess games a very talented human chess player on average has played before…

comparison training alphazero chess muzero

asked Jan 24 '21 at 17:04

220284

votes

2 answers

How does AlphaZero's MCTS work when starting from the root node?

From the AlphaGo Zero paper, during MCTS, statistics for each new node are initialized as such: ${N(s_L, a) = 0, W (s_L, a) = 0, Q(s_L, a) = 0, P (s_L, a) = p_a}$. The PUCT algorithm for selecting the best child node is $a_t = argmax(Q(s,a) +…

deep-rl monte-carlo-tree-search alphazero

asked Dec 30 '20 at 02:03

sb3

votes

2 answers

What is the difference between DQN and AlphaGo Zero?

I have already implemented a relatively simple DQN on Pacman. Now I would like to clearly understand the difference between a DQN and the techniques used by AlphaGo zero/AlphaZero and I couldn't find a place where the features of both approaches are…

reinforcement-learning dqn alphazero deep-rl alphago-zero

asked Feb 27 '19 at 06:17

FenryrMKIII

2 3 4 5 Next