4

I am looking at the different existing methods of action selection in reinforcement learning.

I found several methods like epsilon-greedy, softmax, upper confidence bound and Thompson sampling.

I managed to understand the principle of each method except Thompson sampling.

I can't understand the principle and the way it works and its action selection steps.

If you can explain to me the principle and the functioning of Thompson sampling with a simple example I would be grateful.

Neil Slater
  • 28,678
  • 3
  • 38
  • 60

0 Answers0