1

Agrawal and Goyal (http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf page 3) discussed how we can extend Thompson sampling for bernoulli bandits to Thompson sampling for stochastic bandits in general by simply Bernoulli sampling with the received reward $r_t \in [0,1]$.

My question is whether such extension from Bernoulli bandits to general stochastic bandits hold in general and not only for Thompson sampling. E.g. can I prove properties such as lower bounds on regret for Bernoulli bandits and always transfer these results to general stochastic bandits?

Felix P.
  • 287
  • 1
  • 6

0 Answers0