2

From Wikipedia, in the Monte-Carlo Tree Search algorithm, you should choose the node that maximizes the value:

$${\displaystyle {\frac {w_{i}}{n_{i}}}+c{\sqrt {\frac {\ln N_{i}}{n_{i}}}}},$$

where

  • ${w_{i}}$ stands for the number of wins for the node considered after the $i$-th move,

  • ${n_{i}}$ stands for the number of simulations for the node considered after the $i$-th move,

  • $N_{i}$ stands for the total number of simulations after the $i$-th move run by the parent node of the one considered

  • $c$ is the exploration parameter—theoretically equal to$\sqrt{2}$; in practice usually chosen empirically.

Here (and I've seen in other places as well) it claims that the theoretical ideal value for $c$ is $\sqrt{2}$. Where does this value come from?

(Note: I did post this same question on cross-validated before I knew about this (more relevant) site)

nbro
  • 39,006
  • 12
  • 98
  • 176
  • It's not the same question, but the answer to [this question](https://ai.stackexchange.com/q/24221/2444) should contain the answer to your question. You may want to write a formal answer to your own question, once you understood the explanation in the paper and/or answer. – nbro Dec 10 '20 at 16:28

0 Answers0