Why is the ideal exploration parameter in the UCT algorithm $\sqrt{2}$?

Asked Dec 10 '20 at 16:25

Active Dec 10 '20 at 17:34

Viewed 341 times

From Wikipedia, in the Monte-Carlo Tree Search algorithm, you should choose the node that maximizes the value:

$${\displaystyle {\frac {w_{i}}{n_{i}}}+c{\sqrt {\frac {\ln N_{i}}{n_{i}}}}},$$

where

${w_{i}}$ stands for the number of wins for the node considered after the $i$-th move,
${n_{i}}$ stands for the number of simulations for the node considered after the $i$-th move,
$N_{i}$ stands for the total number of simulations after the $i$-th move run by the parent node of the one considered
$c$ is the exploration parameter—theoretically equal to$\sqrt{2}$; in practice usually chosen empirically.

Here (and I've seen in other places as well) it claims that the theoretical ideal value for $c$ is $\sqrt{2}$. Where does this value come from?

(Note: I did post this same question on cross-validated before I knew about this (more relevant) site)

edited Dec 10 '20 at 17:34

nbro

asked Dec 10 '20 at 16:25

Gilad Felsen

It's not the same question, but the answer to [this question](https://ai.stackexchange.com/q/24221/2444) should contain the answer to your question. You may want to write a formal answer to your own question, once you understood the explanation in the paper and/or answer. – nbro Dec 10 '20 at 16:28

0 Answers0