3

I have been trying to understand why MCTS is very important to the performance of RL agents, and the best description I found was from the paper Bootstrapping from Game Tree Search stating:

Deterministic, two-player games such as chess provide an ideal test-bed for search bootstrapping. The intricate tactics require a significant level of search to provide an accurate position evaluation; learning without search has produced little success in these domains.

I however don't understand why this is the case, and why value based methods are unable to achieve similar performance.

So my question would be:

  • What are the main advantages of incorporating search based algorithms with value based methods?
Hossam
  • 33
  • 3
  • In short and simplified terms, MCTS is like being able to look into the future based on experience and choosing the best action :-) – David Apr 04 '21 at 22:29
  • @DavidIreland yes, but value based methods approximate the expected return of the future as well, so why would looking at the future while executing be more accurate than our value estimate? – Hossam Apr 04 '21 at 22:34
  • we can only estimate our values using function approximation, so our estimates will never be _true_ (because we have far more states than weights). If we can look at the (approximated) value of states we take in, say, 5 actions time, it is better to make a decision based on these estimations, taking into account the true rewards observed after the 5 actions. Further, MCTS also allows more implicit exploration as when choosing the actions to expand the tree we are potentially choosing lots of non-greedy actions that lead to better future returns. – David Apr 05 '21 at 00:34
  • @DavidIreland That comment looks like the basis of a good answer already – Neil Slater Apr 05 '21 at 08:01

1 Answers1

2

Assuming a continuous/uncountable state space, we can only estimate our value function using function approximation, so our estimates will never be true for all states simultaneously (because, loosely speaking, we have far more states than weights). If we can look at the (approximated) value of states we take in, say, 5 actions time, it is better to make a decision based on these estimations, taking into account the true rewards observed after the 5 actions.

Further, MCTS also allows more implicit exploration as when choosing the actions to expand the tree we are potentially choosing lots of non-greedy actions that lead to better future returns.

David
  • 4,591
  • 1
  • 6
  • 25
  • 2
    I would note that technically, "because we have far more states than weights" is not the full story as to why our estimates may not be 100% correct. We could probably come up with some weird MDP where there are many different states, each represented by a single variable, and even linear function approximator with just a single weight is sufficient to perfectly represent the optimal value function. In general, in "real-world" / "interesting" problems, it's roughly true-ish though :) – Dennis Soemers Apr 05 '21 at 11:06
  • 2
    @DennisSoemers I’ve added a disclaimer for you. :) – David Apr 05 '21 at 11:19
  • @DavidIreland I think your point is that using value based methods only is like having perfect confidence in what would happen in the future, while MCTS simulates the future, so it should have better approximation of the value, am I right? – Hossam Apr 05 '21 at 18:08
  • 1
    Yes, that sounds correct @Hossam. MCTS essentially allows the agent to search(/plan) and make decisions using real transitions that are simulated in the environment to better help with the decision making. – David Apr 05 '21 at 18:15