Questions tagged [thompson-sampling]

For questions about Thompson sampling, which is a technique for choosing actions (that addresses the exploration-exploitation dilemma) in the multi-armed bandit and reinforcement learning problems.

14 questions
8
votes
0 answers

Normalizing Normal Distributions in Thompson Sampling for online Reinforcement Learning

In my implementation of Thompson Sampling (TS) for online Reinforcement Learning, my distribution for selecting $a$ is $\mathcal{N}(Q(s, a), \frac{1}{C(s,a)+1})$, where $C(s,a)$ is the number of times $a$ has been picked in $s$. However, I found…
4
votes
1 answer

How to compute the action probabilities with Thompson sampling in deep Q-learning?

In some implementations of off-policy Q-learning, we need to know the action probabilities given by the behavior policy $\mu(a)$ (e.g., if we want to use importance sampling). In my case, I am using Deep Q-Learning and selecting actions using…
4
votes
0 answers

What is the Thompson Sampling in simple terms?

I am looking at the different existing methods of action selection in reinforcement learning. I found several methods like epsilon-greedy, softmax, upper confidence bound and Thompson sampling. I managed to understand the principle of each method…
4
votes
2 answers

Should I use exploration strategy in Policy Gradient algorithms?

In policy gradient algorithms the output is a stochastic policy - a probability for each action. I believe that if I follow the policy (sample an action from the policy) I make use of exploration because each action has a certain probability so I…
3
votes
3 answers

Why aren't exploration techniques, such as UCB or Thompson sampling, used in full RL problems?

Why aren't exploration techniques, such as UCB or Thompson sampling, typically used in bandit problems, used in full RL problems? Monte Carlo Tree Search may use the above-mentioned methods in its selection step, but why do value-based and policy…
2
votes
0 answers

Minimum sampling for maximising the prediction accuracy

Suppose that I'm training a machine learning model to predict people's age by a picture of their faces. Lets say that I have a dataset of people from 1 year olds to 100 year olds. But I want to choose just 9 (arbitrary) ages out of this 100 age…
noone
  • 123
  • 4
1
vote
0 answers

Data Imbalance in Contextual Bandit with Thompson Sampling

I'm working with the Online Logistic Regression Algorithm (Algorithm 3) of Chapelle and Li in their paper, "An Empirical Evaluation of Thompson Sampling" (https://papers.nips.cc/paper/2011/file/e53a0a2978c28872a4505bdb51db06dc-Paper.pdf). It's a…
1
vote
0 answers

Multi-armed bandits: reducing stochastic multi-armed bandits to bernoulli bandits

Agrawal and Goyal (http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf page 3) discussed how we can extend Thompson sampling for bernoulli bandits to Thompson sampling for stochastic bandits in general by simply Bernoulli sampling with the…
1
vote
1 answer

Why am I getting better performance with Thompson sampling than with UCB or $\epsilon$-greedy in a multi-armed bandit problem?

I ran a test using 3 strategies for multi-armed bandit: UCB, $\epsilon$-greedy, and Thompson sampling. The results for the rewards I got are as follows: Thompson sampling had the highest average reward UCB was second $\epsilon$-greedy was third,…
0
votes
0 answers

How to use UCB or TS in linear programming?

Consider a sequential decision-making problem over $T$ periods where the parameters of the problem should be learned and also optimize an objective function. One possibility is to model the problem as a dynamic program and use RL techniques to solve…
0
votes
0 answers

Thompson sampling, is it accurate for smaller sample sizes?

For example are 500 samples enough? I tried this code import numpy as np import matplotlib.pyplot as plt import pandas as pd dataset = pd.read_csv('Ads_CTR_Optimisation.csv') # Implementing Thompson Sampling import random N = 499 d =…
Alex
  • 1
0
votes
0 answers

Is there a variant of Thompson Sampling that works with variable bandits?

Does there exist a variant of TS, such that, while computing the returns of multi-armed bandits, we have the possibility of introducing an extra bandit? For instance, while we are applying TS to 3 slot machines, we come to know about the existence…
0
votes
1 answer

Why is Thompson Sampling considered a part of Reinforcement Learning?

I often see Thompson Sampling in RL literature, however, I am not able to relate it to any of the current RL techniques. How exactly does it fit with RL?
desert_ranger
  • 586
  • 3
  • 19
0
votes
0 answers

Thompson sampling with Bernoulli prior and non-binary reward update

I am solving a problem for which I have to select the best possible servers (level 1) to hit for a given data. These servers (level 1) in turn hit some other servers (level 2) to complete the request. The level 1 servers have the same set of level 2…