11

I want to start with a scenario that got me thinking about how well MCTS can perform: Let's assume there is a move that is not yet added to the search tree. It is some layers/moves too deep. But if we play this move the game is basically won. However let's also assume that all moves that could be taken instead at the given game state are very very bad. For the sake of argument let's say there are 1000 possible moves and only one of them is good (but very good) and the rest is very bad. Wouldn't MCTS fail to recognize this and not grow the search tree towards this move and also rate this subtree very badly? I know that MCTS eventually converges to minimax (and eventually it will build the whole tree if there is enough memory). Then it should know that the move is good even though there are many bad possiblities. But I guess in practice this is not something that one can rely on. Maybe someone can tell me if this is a correct evaluation on my part.

Apart from this special scenario I'd also like to know if there are other such scenarios where MCTS will perform badly (or extraordinary well).

Nocta
  • 111
  • 2
  • MCTS is probabilistic. As such it needs clues or it won't find anything. For example: seeking the needle in the haystack. Try this and you will fail. It would be good if you could come up with a more realistic example and would ask what would be the optimal strategy for that example. This might give a hint of how to better find the needles in the haystack. – NoDataDumpNoContribution Sep 06 '16 at 20:00

1 Answers1

2

Whether the move is found and how quick it is found depends on a few things. If I understand correctly, there is a sequence of many "bad" moves which lead to the "big win" move, and you are afraid that the MCTS algorithm will not get to the "big win" move because it will be selecting more promising moves further up the tree. Some things to think about (read also the Wikipedia MCTS article):

  • when doing playouts, you can do play your game only for a few further moves or down to the game end. Playing only a few moves further is obviously quicker, but in the extreme case you described it would not be the best choice. If you know about the existence of such scenarios, ensure to play the game to the end in the playouts.

  • when doing playouts, you can choose your moves/actions either randomly or based on some simple, greedy (quick) heuristics tailored to your problem. Are there maybe greedy heuristics designed to find or take into account such scenarios for your game/problem? If yes, implement them. It is then called a "heavy playout". Compare the results to playouts using random moves.

  • If you choose the actions using UCT (Upper Confidence Bound applied to Trees), then the first part of the expression is responsible for exploitation. Moves with high average win ratio are preferred. The second part though corresponds to exploration. If the exploration parameter is set high enough (test empirically for your problem), then moves with few simulations will be preferred. High exploration would be another way to find your golden move, in detriment of exploitation (read about the exploration/exploitation dilemma).

If you describe a realistic game or problem scenario, we may be able to help you come up with a suitable strategy.

AlexGuevara
  • 263
  • 1
  • 8