I often see Thompson Sampling in RL literature, however, I am not able to relate it to any of the current RL techniques. How exactly does it fit with RL?
Asked
Active
Viewed 226 times
1 Answers
3
Thompson Sampling (TS) is used in the context of bandits, which is a special case of the RL problem.
You can also use TS for the full RL problem, but that can lead to inefficient exploration. To know more about this issue, you could read
- the section 7.5 Reinforcement Learning in Markov Decision Processes (p. 62) of the tutorial A Tutorial on Thompson Sampling (2017) by Russo et al.,
- my answer here, and
- the paper Deep Exploration via Randomized Value Functions (2019, JMLR) by Ian Osband et al.

nbro
- 39,006
- 12
- 98
- 176