Questions tagged [muzero]

For questions about the MuZero algorithm proposed in the paper "Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model" (2019).

10 questions
5
votes
2 answers

Is it practical to train AlphaZero or MuZero (for indie games) on a personal computer?

Is it practical/affordable to train an AlphaZero/MuZero engine using a residential gaming PC, or would it take thousands of years of training for the AI to learn enough to challenge humans? I'm having trouble wrapping my head around how much…
Luke W
  • 53
  • 3
5
votes
1 answer

Do AlphaZero/MuZero learn faster in terms of number of games played than humans?

I don't know much about AI and am just curious. From what I read, AlphaZero/MuZero outperform any human chess player after a few hours of training. I have no idea how many chess games a very talented human chess player on average has played before…
220284
  • 153
  • 4
3
votes
1 answer

How is MuZero's second binary plane for chess defined?

From the MuZero paper (Appendix E, page 13): In chess, 8 planes are used to encode the action. The first one-hot plane encodes which position the piece was moved from. The next two planes encode which position the piece was moved to: a one-hot…
MuZeroFm
  • 31
  • 2
3
votes
1 answer

How does MuZero learn to play well for both sides of a two-player game?

I'm coding my own version of MuZero. However, I don't understand how it supposed to learn to play well for both players in a two-player game. Take Go for example. If I use a single MCTS to generate an entire game (to be used in the training stage),…
Ziofil
  • 128
  • 7
2
votes
2 answers

How are NNs output setup for games that allow multiple actions each turn and have very large sets of possible actions?

I was looking at an AI coding challenge for a two player game on a 2D grid of variable size (from one game to the next). Here is a screen shot example of the playfield. Each player has multiple units on the board. In fact, each tile can hold…
2
votes
1 answer

In the MuZero paper, how does backprop in the MCTS account for the immediate reward from each edge?

On page 12 of this paper: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model, it describes how MCTS works for the MuZero algorithm. It states in equation 4 that during the 'backup' after a simulation, the mean value (Q) for every…
2
votes
1 answer

How to choose the first action in a Monte Carlo Tree Search?

I'm working on reimplementing the MuZero paper. In the description of the MCTS (page 12), they indicate that a new node with associated state $s$ is to be initialized with $Q(s,a) = 0$, $N(s,a) = 0$ and $P(s,a) = p_a$. From this, I understand that…
Ziofil
  • 128
  • 7
1
vote
1 answer

Reproducing AlphaZero/MuZero: Failed to beat initial model in arena

I am trying to reproduce AlphaZero's algorithm on the board game Carcassonne. Since I want to use the final game score differences (i.e. victory point of player 1 - victory point of player 2) as the final and only reward, AlphaZero's UCB score can…
1
vote
1 answer

Scrabble rack observation with MuZero

Currently I'm trying to implement Scrabble with MuZero. The $15 \times 15$ game board observation (as input) is of size $27 \times15 \times15$ (26 letters + 1 wildcard) with a value of 0 or 1. However I'm having difficulties finding a suitable way…
1
vote
1 answer

Is which sense was AlphaGo "just given a rule book"?

I was told that AlphaGo (or some related program) was not explicitly taught even the rules of Go -- if it was "just given the rulebook", what does this mean? Literally, a book written in English to read?