For questions about Thompson sampling, which is a technique for choosing actions (that addresses the exploration-exploitation dilemma) in the multi-armed bandit and reinforcement learning problems.
Questions tagged [thompson-sampling]
14 questions
8
votes
0 answers
Normalizing Normal Distributions in Thompson Sampling for online Reinforcement Learning
In my implementation of Thompson Sampling (TS) for online Reinforcement Learning, my distribution for selecting $a$ is $\mathcal{N}(Q(s, a), \frac{1}{C(s,a)+1})$, where $C(s,a)$ is the number of times $a$ has been picked in $s$.
However, I found…

Kevin
- 81
- 2
4
votes
1 answer
How to compute the action probabilities with Thompson sampling in deep Q-learning?
In some implementations of off-policy Q-learning, we need to know the action probabilities given by the behavior policy $\mu(a)$ (e.g., if we want to use importance sampling).
In my case, I am using Deep Q-Learning and selecting actions using…

nicolas
- 43
- 2
4
votes
0 answers
What is the Thompson Sampling in simple terms?
I am looking at the different existing methods of action selection in reinforcement learning.
I found several methods like epsilon-greedy, softmax, upper confidence bound and Thompson sampling.
I managed to understand the principle of each method…

user14053977
- 41
- 1
4
votes
2 answers
Should I use exploration strategy in Policy Gradient algorithms?
In policy gradient algorithms the output is a stochastic policy - a probability for each action.
I believe that if I follow the policy (sample an action from the policy) I make use of exploration because each action has a certain probability so I…

gnikol
- 175
- 7
3
votes
3 answers
Why aren't exploration techniques, such as UCB or Thompson sampling, used in full RL problems?
Why aren't exploration techniques, such as UCB or Thompson sampling, typically used in bandit problems, used in full RL problems?
Monte Carlo Tree Search may use the above-mentioned methods in its selection step, but why do value-based and policy…

Mika
- 331
- 1
- 8
2
votes
0 answers
Minimum sampling for maximising the prediction accuracy
Suppose that I'm training a machine learning model to predict people's age by a picture of their faces. Lets say that I have a dataset of people from 1 year olds to 100 year olds. But I want to choose just 9 (arbitrary) ages out of this 100 age…

noone
- 123
- 4
1
vote
0 answers
Data Imbalance in Contextual Bandit with Thompson Sampling
I'm working with the Online Logistic Regression Algorithm (Algorithm 3) of Chapelle and Li in their paper, "An Empirical Evaluation of Thompson Sampling" (https://papers.nips.cc/paper/2011/file/e53a0a2978c28872a4505bdb51db06dc-Paper.pdf). It's a…

MABQ
- 11
- 1
1
vote
0 answers
Multi-armed bandits: reducing stochastic multi-armed bandits to bernoulli bandits
Agrawal and Goyal (http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf page 3) discussed how we can extend Thompson sampling for bernoulli bandits to Thompson sampling for stochastic bandits in general by simply Bernoulli sampling with the…

Felix P.
- 287
- 1
- 6
1
vote
1 answer
Why am I getting better performance with Thompson sampling than with UCB or $\epsilon$-greedy in a multi-armed bandit problem?
I ran a test using 3 strategies for multi-armed bandit: UCB, $\epsilon$-greedy, and Thompson sampling.
The results for the rewards I got are as follows:
Thompson sampling had the highest average reward
UCB was second
$\epsilon$-greedy was third,…

Java coder
- 11
- 1
- 2
0
votes
0 answers
How to use UCB or TS in linear programming?
Consider a sequential decision-making problem over $T$ periods where the parameters of the problem should be learned and also optimize an objective function. One possibility is to model the problem as a dynamic program and use RL techniques to solve…

Amin
- 471
- 2
- 11
0
votes
0 answers
Thompson sampling, is it accurate for smaller sample sizes?
For example are 500 samples enough? I tried this code
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('Ads_CTR_Optimisation.csv')
# Implementing Thompson Sampling
import random
N = 499
d =…

Alex
- 1
0
votes
0 answers
Is there a variant of Thompson Sampling that works with variable bandits?
Does there exist a variant of TS, such that, while computing the returns of multi-armed bandits, we have the possibility of introducing an extra bandit?
For instance, while we are applying TS to 3 slot machines, we come to know about the existence…

desert_ranger
- 586
- 3
- 19
0
votes
1 answer
Why is Thompson Sampling considered a part of Reinforcement Learning?
I often see Thompson Sampling in RL literature, however, I am not able to relate it to any of the current RL techniques. How exactly does it fit with RL?

desert_ranger
- 586
- 3
- 19
0
votes
0 answers
Thompson sampling with Bernoulli prior and non-binary reward update
I am solving a problem for which I have to select the best possible servers (level 1) to hit for a given data. These servers (level 1) in turn hit some other servers (level 2) to complete the request. The level 1 servers have the same set of level 2…