Questions tagged [alphago]

For questions related to DeepMind's AlphaGo, which is the first computer Go program to beat a human professional Go player without handicaps on a full-sized 19x19 board. AlphaGo was introduced in the paper "Mastering the game of Go with deep neural networks and tree search" (2016) by David Silver et al. There have been three more powerful successors of AlphaGo: AlphaGo Master, AlphaGo Zero and AlphaZero.

Have a look at the original research paper that introduced AlphaGo Mastering the game of Go with deep neural networks and tree search (2016), by David Silver et al., published in Nature. Have also a look at https://en.wikipedia.org/wiki/AlphaGo.

27 questions
17
votes
1 answer

How does "Monte-Carlo search" work?

I have heard about this concept in a Reddit post about AlphaGo. I have tried to go through the paper and the article, but could not really make sense of the algorithm. So, can someone give an easy-to-understand explanation of how the Monte-Carlo…
Dawny33
  • 1,371
  • 13
  • 29
10
votes
1 answer

Is AlphaZero an example of an AGI?

From DeepMind's research paper on arxiv.org: In this paper, we apply a similar but fully generic algorithm, which we call AlphaZero, to the games of chess and shogi as well as Go, without any additional domain knowledge except the rules of the…
Siddhartha
  • 413
  • 2
  • 11
7
votes
3 answers

Would AlphaGo Zero become perfect with enough training time?

Would AlphaGo Zero become theoretically perfect with enough training time? If not, what would be the limiting factor? (By perfect, I mean it always wins the game if possible, even against another perfect opponent.)
7
votes
4 answers

Does the recent advent of a Go playing computer represent Artificial Intelligence?

I read that in the spring of 2016 a computer Go program was finally able to beat a professional human for the first time. Now that this milestone has been reached, does that represent a significant advance in artificial intelligence techniques or…
WilliamKF
  • 2,493
  • 1
  • 24
  • 31
6
votes
1 answer

Why is a constant plane of ones added into the input features of AlphaGo?

In the paper Mastering the game of Go with deep neural networks and tree search, the input features of the networks of AlphaGo contains a plane of constant ones and a plane of constant zeros, as following. Feature #of planes Description Stone…
Yangcy
  • 61
  • 2
5
votes
2 answers

What part of the game is the value network trained to predict a winner on?

The Alpha Zero (as well as AlphaGo Zero) papers say they trained the value head of the network by "minimizing the error between the predicted winner and the game winner" throughout its many self-play games. As far as I could tell, further…
5
votes
1 answer

Is the new AlphaGo implementation using Generative Adversarial Networks?

I read through the publication Mastering the game of Go without Human Knowledge. It doesn't seem to use GANs, just a new form of search and reinforcement learning.
5
votes
1 answer

Why did AlphaGo lose its Go game?

We can read on wiki page that in March 2016 AlphaGo AI lost its game (1 of 5) to Lee Sedol, a professional Go player. One article cite says: AlphaGo lost a game and we as researchers want to explore that and find out what went wrong. We need to…
kenorb
  • 10,423
  • 3
  • 43
  • 91
5
votes
1 answer

Why didn't champion of the Go game manage to win the last game against AlphaGo, after winning the 4th one?

In the documentary about the match, it is said that after losing the 4th game, AlphaGo came back stronger and started to play in a weird way (not human-like) and it was pretty impossible to be beaten. Why and how did that happen?
4
votes
1 answer

Why is chess still a benchmark for Artificial Intelligence?

Even though modern chess playing programs have demonstrated themselves to be as strong (or stronger) than even the best human players for nearly 20 years now (1997 when IBM's Deep Blue defeated the world chess champion Gary Kasparov), why would a…
DJ2
  • 143
  • 3
4
votes
1 answer

Would it take 1700 years to run AlphaGo Zero in commodity hardware?

From this link, AlphaGo would take millennia to run in regular hardware. They generated 29 million games for the final result, which means it's going to take me about 1700 years to replicate this. Are these calculations correct?
4
votes
1 answer

Alphazero policy head loss not decreasing

I am now working on training an alphazero player for a board game. The implementation of board game is mine, MCTS for alphazero was taken elsewhere. Due to complexity of the game, it takes a much longer time to self-play than to train. As you know,…
3
votes
1 answer

AlphaGo Zero: does $Q(s_t, a)$ dominate $U(s_t, a)$ in difficult game states?

AlphaGo Zero AlphaGo Zero uses a Monte-Carlo Tree Search where the selection phase is governed by $\operatorname*{argmax}\limits_a\left( Q(s_t, a) + U(s_t, a) \right)$, where: the exploitation parameter is $Q(s_t, a) = \displaystyle…
3
votes
0 answers

What does "convolve k filters" mean in the AlphaGo paper?

On page 27 of the DeepMind AlphaGo paper appears the following sentence: The first hidden layer zero pads the input into a $23 \times 23$ image, then convolves $k$ filters of kernel size $5 \times 5$ with stride $1$ with the input image and applies…
3
votes
2 answers

How does the AlphaGo Zero policy decide what move to execute?

I was going through the AlphaGo Zero paper and I was trying to understand everything, but I just can't figure out this one formula: $$ \pi(a \mid s_0) = \frac{N(s_0, a)^{\frac{1}{\tau}}}{\sum_b N(s_0, b)^{\frac{1}{\tau}}} $$ Could someone decode how…
1
2