My question concerns Stochastic Combinatorial Multiarmed Bandits. More specifically, the algorithm called CombUCB1 presented in this paper. It is a UCB-like algorithm.
Essentially, in each round of the sequential game the learner chooses a super-arm to play. A super-arm is a $d$-dimensional vector $a \in \mathcal{A}$ where $\mathcal{A} \subset \{0,1\}^d$. In each super-arm $a$, when the $i$-element equals to $1$ ( $i \in \{0, \dots, d\}, a(i)=1$ ), that means that the basic action $i$ is active. Basically, in each round the learner plays the basic actions that are active in the chosen super-arm. The rewards of the basic actions are stochastics and a super-arm receives as a reward the sum of the rewards of the basic active actions.
The algorithm mentioned above presents a UCB-like algorithm, where with each basic action is associated a UCB-index and in each round the learner plays the super-arm that maximises that index. My question concerns the confidence interval around the mean of the rewards of the basic actions, presented in equation $2$ of the mentioned paper. Here, the exploration bonus is
$c_{t,s} = \sqrt{\frac{1.5 \log t}{s}}$
I don't understand where that $1.5$ is coming from. I've always known that one needs to use Chernoff-Hoeffding inequality to derive the exploration bonus in a UCB-algorithm. Am I wrong and it needs to be computed in other ways? I've always seen the same coefficient but with $2$ instead of $1.5$ (reference). Could someone explain me where does $1.5$ come from, please?
I know there is a similar question here, but I cannot really understand how that works here.
Thank you in advance in case you have time to read and answer my questions.