Multi-armed bandits: reducing stochastic multi-armed bandits to bernoulli bandits

Asked Nov 08 '20 at 14:20

Active Nov 08 '20 at 14:20

Viewed 43 times

Agrawal and Goyal (http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf page 3) discussed how we can extend Thompson sampling for bernoulli bandits to Thompson sampling for stochastic bandits in general by simply Bernoulli sampling with the received reward $r_t \in [0,1]$.

My question is whether such extension from Bernoulli bandits to general stochastic bandits hold in general and not only for Thompson sampling. E.g. can I prove properties such as lower bounds on regret for Bernoulli bandits and always transfer these results to general stochastic bandits?

asked Nov 08 '20 at 14:20

Felix P.

Multi-armed bandits: reducing stochastic multi-armed bandits to bernoulli bandits

0 Answers0