Questions tagged [alphago]

For questions related to DeepMind's AlphaGo, which is the first computer Go program to beat a human professional Go player without handicaps on a full-sized 19x19 board. AlphaGo was introduced in the paper "Mastering the game of Go with deep neural networks and tree search" (2016) by David Silver et al. There have been three more powerful successors of AlphaGo: AlphaGo Master, AlphaGo Zero and AlphaZero.

Have a look at the original research paper that introduced AlphaGo Mastering the game of Go with deep neural networks and tree search (2016), by David Silver et al., published in Nature. Have also a look at https://en.wikipedia.org/wiki/AlphaGo.

27 questions

votes

1 answer

How does "Monte-Carlo search" work?

I have heard about this concept in a Reddit post about AlphaGo. I have tried to go through the paper and the article, but could not really make sense of the algorithm. So, can someone give an easy-to-understand explanation of how the Monte-Carlo…

asked Aug 05 '16 at 07:03

Dawny33

1,371
13
29

votes

1 answer

Is AlphaZero an example of an AGI?

From DeepMind's research paper on arxiv.org: In this paper, we apply a similar but fully generic algorithm, which we call AlphaZero, to the games of chess and shogi as well as Go, without any additional domain knowledge except the rules of the…

game-ai definitions agi alphazero alphago

asked Nov 26 '18 at 00:42

Siddhartha

votes

3 answers

Would AlphaGo Zero become perfect with enough training time?

Would AlphaGo Zero become theoretically perfect with enough training time? If not, what would be the limiting factor? (By perfect, I mean it always wins the game if possible, even against another perfect opponent.)

neural-networks monte-carlo-tree-search alphago alphazero alphago-zero

asked Sep 10 '18 at 22:31

PyRulez

votes

4 answers

Does the recent advent of a Go playing computer represent Artificial Intelligence?

I read that in the spring of 2016 a computer Go program was finally able to beat a professional human for the first time. Now that this milestone has been reached, does that represent a significant advance in artificial intelligence techniques or…

philosophy definitions agi alphago narrow-ai

asked Aug 02 '16 at 16:20

WilliamKF

2,493
1
24
31

votes

1 answer

Why is a constant plane of ones added into the input features of AlphaGo?

In the paper Mastering the game of Go with deep neural networks and tree search, the input features of the networks of AlphaGo contains a plane of constant ones and a plane of constant zeros, as following. Feature #of planes Description Stone…

machine-learning alphago

asked Mar 05 '19 at 09:22

Yangcy

votes

2 answers

What part of the game is the value network trained to predict a winner on?

The Alpha Zero (as well as AlphaGo Zero) papers say they trained the value head of the network by "minimizing the error between the predicted winner and the game winner" throughout its many self-play games. As far as I could tell, further…

machine-learning reinforcement-learning alphago alphazero alphago-zero

asked Sep 13 '18 at 03:37

chessprogrammer

2,215
2
12
23

votes

1 answer

Is the new AlphaGo implementation using Generative Adversarial Networks?

I read through the publication Mastering the game of Go without Human Knowledge. It doesn't seem to use GANs, just a new form of search and reinforcement learning.

reinforcement-learning game-ai generative-adversarial-networks monte-carlo-tree-search alphago

asked Oct 24 '17 at 13:34

dougvk

votes

1 answer

Why did AlphaGo lose its Go game?

We can read on wiki page that in March 2016 AlphaGo AI lost its game (1 of 5) to Lee Sedol, a professional Go player. One article cite says: AlphaGo lost a game and we as researchers want to explore that and find out what went wrong. We need to…

game-ai deepmind alphago

asked Aug 09 '16 at 12:00

kenorb

10,423
3
43
91

votes

1 answer

Why didn't champion of the Go game manage to win the last game against AlphaGo, after winning the 4th one?

In the documentary about the match, it is said that after losing the 4th game, AlphaGo came back stronger and started to play in a weird way (not human-like) and it was pretty impossible to be beaten. Why and how did that happen?

reinforcement-learning game-ai monte-carlo-tree-search alphago

asked Mar 26 '19 at 20:43

Jay Critch

votes

1 answer

Why is chess still a benchmark for Artificial Intelligence?

Even though modern chess playing programs have demonstrated themselves to be as strong (or stronger) than even the best human players for nearly 20 years now (1997 when IBM's Deep Blue defeated the world chess champion Gary Kasparov), why would a…

chess intelligence-testing alphago benchmarks

asked Dec 15 '17 at 20:39

DJ2

votes

1 answer

Would it take 1700 years to run AlphaGo Zero in commodity hardware?

From this link, AlphaGo would take millennia to run in regular hardware. They generated 29 million games for the final result, which means it's going to take me about 1700 years to replicate this. Are these calculations correct?

deep-rl alphago-zero computational-complexity alphago

asked Nov 30 '17 at 11:44

BlueMoon93

votes

1 answer

Alphazero policy head loss not decreasing

I am now working on training an alphazero player for a board game. The implementation of board game is mine, MCTS for alphazero was taken elsewhere. Due to complexity of the game, it takes a much longer time to self-play than to train. As you know,…

neural-networks reinforcement-learning objective-functions alphago alphazero

asked Apr 24 '19 at 09:08

ytolochko

votes

1 answer

AlphaGo Zero: does $Q(s_t, a)$ dominate $U(s_t, a)$ in difficult game states?

AlphaGo Zero AlphaGo Zero uses a Monte-Carlo Tree Search where the selection phase is governed by $\operatorname*{argmax}\limits_a\left( Q(s_t, a) + U(s_t, a) \right)$, where: the exploitation parameter is $Q(s_t, a) = \displaystyle…

reinforcement-learning monte-carlo-tree-search alphazero alphago-zero alphago

asked Dec 03 '20 at 03:14

user3667125

1,500
5
13

votes

0 answers

What does "convolve k filters" mean in the AlphaGo paper?

On page 27 of the DeepMind AlphaGo paper appears the following sentence: The first hidden layer zero pads the input into a $23 \times 23$ image, then convolves $k$ filters of kernel size $5 \times 5$ with stride $1$ with the input image and applies…

convolutional-neural-networks convolution alphago filters convolutional-layers

asked Aug 19 '20 at 04:28

William Ehlhardt

votes

2 answers

How does the AlphaGo Zero policy decide what move to execute?

I was going through the AlphaGo Zero paper and I was trying to understand everything, but I just can't figure out this one formula: $$ \pi(a \mid s_0) = \frac{N(s_0, a)^{\frac{1}{\tau}}}{\sum_b N(s_0, b)^{\frac{1}{\tau}}} $$ Could someone decode how…

reinforcement-learning policies deepmind alphago-zero alphago

asked May 07 '20 at 11:09

Eloi M.

2 Next