Highest Voted 'muzero' Questions - Artificial Intelligence Stack Exchange

5

votes

2 answers

Is it practical to train AlphaZero or MuZero (for indie games) on a personal computer?

Is it practical/affordable to train an AlphaZero/MuZero engine using a residential gaming PC, or would it take thousands of years of training for the AI to learn enough to challenge humans? I'm having trouble wrapping my head around how much…

asked Feb 15 '21 at 15:20

Luke W

53
3

5

votes

1 answer

Do AlphaZero/MuZero learn faster in terms of number of games played than humans?

I don't know much about AI and am just curious. From what I read, AlphaZero/MuZero outperform any human chess player after a few hours of training. I have no idea how many chess games a very talented human chess player on average has played before…

comparison training alphazero chess muzero

asked Jan 24 '21 at 17:04

220284

153
4

3

votes

1 answer

How is MuZero's second binary plane for chess defined?

From the MuZero paper (Appendix E, page 13): In chess, 8 planes are used to encode the action. The first one-hot plane encodes which position the piece was moved from. The next two planes encode which position the piece was moved to: a one-hot…

reinforcement-learning deep-rl papers muzero

asked Nov 14 '20 at 04:35

MuZeroFm

31
2

3

votes

1 answer

How does MuZero learn to play well for both sides of a two-player game?

I'm coding my own version of MuZero. However, I don't understand how it supposed to learn to play well for both players in a two-player game. Take Go for example. If I use a single MCTS to generate an entire game (to be used in the training stage),…

reinforcement-learning self-play muzero

asked Oct 31 '20 at 17:47

Ziofil

128
7

2

votes

2 answers

How are NNs output setup for games that allow multiple actions each turn and have very large sets of possible actions?

I was looking at an AI coding challenge for a two player game on a 2D grid of variable size (from one game to the next). Here is a screen shot example of the playfield. Each player has multiple units on the board. In fact, each tile can hold…

neural-networks deep-learning convolutional-neural-networks dqn muzero

asked Dec 25 '22 at 02:27

snowfrogdev

121
3

2

votes

1 answer

In the MuZero paper, how does backprop in the MCTS account for the immediate reward from each edge?

On page 12 of this paper: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model, it describes how MCTS works for the MuZero algorithm. It states in equation 4 that during the 'backup' after a simulation, the mean value (Q) for every…

reinforcement-learning papers monte-carlo-tree-search muzero

asked May 31 '22 at 08:08

Matrix001

23
4

2

votes

1 answer

How to choose the first action in a Monte Carlo Tree Search?

I'm working on reimplementing the MuZero paper. In the description of the MCTS (page 12), they indicate that a new node with associated state $s$ is to be initialized with $Q(s,a) = 0$, $N(s,a) = 0$ and $P(s,a) = p_a$. From this, I understand that…

monte-carlo-tree-search muzero

asked Nov 01 '20 at 13:58

Ziofil

128
7

1

vote

1 answer

Reproducing AlphaZero/MuZero: Failed to beat initial model in arena

I am trying to reproduce AlphaZero's algorithm on the board game Carcassonne. Since I want to use the final game score differences (i.e. victory point of player 1 - victory point of player 2) as the final and only reward, AlphaZero's UCB score can…

reinforcement-learning monte-carlo-tree-search alphazero self-play muzero

asked Apr 07 '23 at 18:55

TommyX

13
3

1

vote

1 answer

Scrabble rack observation with MuZero

Currently I'm trying to implement Scrabble with MuZero. The $15 \times 15$ game board observation (as input) is of size $27 \times15 \times15$ (26 letters + 1 wildcard) with a value of 0 or 1. However I'm having difficulties finding a suitable way…

reinforcement-learning game-ai muzero observation-spaces board-games

asked Jul 20 '21 at 10:29

Thrusticy

11
1

1

vote

1 answer

Is which sense was AlphaGo "just given a rule book"?

I was told that AlphaGo (or some related program) was not explicitly taught even the rules of Go -- if it was "just given the rulebook", what does this mean? Literally, a book written in English to read?

reinforcement-learning alphazero alphago-zero alphago muzero

asked Nov 06 '20 at 19:42

releseabe

141
1

Questions tagged [muzero]