2

I'm a bit confused about the visualization of the upper bound (following the notation of (c.f. Sutton & Barto (2018))

$$Q_t(a)+C\sqrt{\frac{\mathrm{ln}(t)}{N_t(a)}}$$

In many blog posts about the UCB(1)-algorithm, such as visualized in the following image (c.f. Link ):

Text

Isn't the upper (confidence) bound simply the upper bound of a one-sided confidence interval instead of a two-sided confidence interval as shown in the image above? A lower bound of the interval is completely useless in this case, or am I wrong?

nbro
  • 39,006
  • 12
  • 98
  • 176
D. B.
  • 101
  • 6
  • 1
    As far as I am aware it is for a two sided confidence interval. I only have a brief knowledge of MAB's so I don't want to write an official answer but I am fairly certain a paper I read on them a few years ago also proved a lower bound (I guess for completeness) because you are right in saying that a lower bound would be useless for a MAB. – David Jan 19 '21 at 21:08
  • 1
    In the MAB domain, it is possible to specify a lower as well as an upper bound for the regret of an algorithm. Probably you mix it up with that? – D. B. Jan 19 '21 at 21:39

1 Answers1

1

The upper bound used here is derived from Hoeffding's inequality, which provides a symmetric, two-sided confidence interval. A good pair of blog posts on how this bound used in UCB for bandits is derived can be found here:

  1. First steps: Explore-then-Commit
  2. The Upper Confidence Bound Algorithm

Indeed, in practice when using this UCB for bandits, we do not actually care about the lower bound. We only need the upper found for the exploration mechanism. But the lower bound does still exist, even if we don't use it.

Dennis Soemers
  • 9,894
  • 2
  • 25
  • 66
  • I didn't even understand the question though. A funny thing is that if you used the LCB as your measure to make decisions you will get the same decisions as UCB. –  Feb 01 '21 at 05:22