5

In the documentary about the match, it is said that after losing the 4th game, AlphaGo came back stronger and started to play in a weird way (not human-like) and it was pretty impossible to be beaten. Why and how did that happen?

nbro
  • 39,006
  • 12
  • 98
  • 176
Jay Critch
  • 343
  • 1
  • 7
  • 3
    This might actually be better as a "soft question" since I'm not sure we can know definitively. The simple answer is that Go is still unsolvable, so both Sedol and AlphaGo were surely making plenty of "mistakes" (non-optimal placements) and, Sedol's choices were stronger in game 4, MCTS is relentless in chipping away at any advantage. *(I discovered this recently playing Taverner's AIAI in partisan sudoku--for the strong AI, errors aren't an insurmountable obstacle to play out from under.)* – DukeZhou Mar 26 '19 at 20:56

1 Answers1

6

The technique used by AlphaGo is "Monte Carlo Tree Search", combined with a very well trained neural network. The network's job is to estimate the quality of different board states and moves. This estimation is deterministic. If you show AlphaGo the same board on two different occasions, it thinks it is exactly as good (or bad) on both occasions.

Monte Carlo Tree Search however, is a randomized algorithm. So as a simplified explanation, the way AlphaGo decides which moves to make is:

  1. Look at the current board.
  2. Pick a random move, and imagine what the board would look like if you made that move.
  3. Pick a random move for your opponent, and imagine what the board would look like if they made that move.
  4. Keep doing steps 2 & 3 for a while, so that we're imagining being at some board many moves in the future along a random line of play.
  5. Ask the neural network how good this board state is.
  6. Repeat steps 1-5 many times. Then make whichever move led to the best lines of play on average, to make right now.

What this means is, AlphaGo won't always play the same way, because it doesn't actually consider every move explicitly. It just thinks about enough lines of play to be pretty confident about whether one move is better than another. This is actually not so far removed from how humans play most of the middle of games like these.

So, in game 4, essentially, Sedol got lucky. The random lines of play that AlphaGo chose to look at did not capture some critical facts about one or more board states. This led it to make a mistake. If you asked it to play the same game through again, it might not make the same mistakes (it might think about different random lines of play, that do capture the critical facts it missed in the first game). Further, it might choose to play slightly differently on other moves, which could have a big impact on the rest of the game. These two factors prevented Sedol from simply playing game 4 over again.

John Doucette
  • 9,147
  • 1
  • 17
  • 52