Highest Voted 'thompson-sampling' Questions - Artificial Intelligence Stack Exchange

8

votes

0 answers

Normalizing Normal Distributions in Thompson Sampling for online Reinforcement Learning

In my implementation of Thompson Sampling (TS) for online Reinforcement Learning, my distribution for selecting $a$ is $\mathcal{N}(Q(s, a), \frac{1}{C(s,a)+1})$, where $C(s,a)$ is the number of times $a$ has been picked in $s$. However, I found…

asked Nov 22 '19 at 12:03

Kevin

81
2

4

votes

1 answer

How to compute the action probabilities with Thompson sampling in deep Q-learning?

In some implementations of off-policy Q-learning, we need to know the action probabilities given by the behavior policy $\mu(a)$ (e.g., if we want to use importance sampling). In my case, I am using Deep Q-Learning and selecting actions using…

reinforcement-learning dqn deep-rl dropout thompson-sampling

asked Jun 15 '18 at 09:11

nicolas

43
2

4

votes

0 answers

What is the Thompson Sampling in simple terms?

I am looking at the different existing methods of action selection in reinforcement learning. I found several methods like epsilon-greedy, softmax, upper confidence bound and Thompson sampling. I managed to understand the principle of each method…

reinforcement-learning thompson-sampling

asked Feb 02 '22 at 11:23

user14053977

41
1

4

votes

2 answers

Should I use exploration strategy in Policy Gradient algorithms?

In policy gradient algorithms the output is a stochastic policy - a probability for each action. I believe that if I follow the policy (sample an action from the policy) I make use of exploration because each action has a certain probability so I…

reinforcement-learning policy-gradients exploration-exploitation-tradeoff upper-confidence-bound thompson-sampling

asked Jun 06 '20 at 21:38

gnikol

175
7

3

votes

3 answers

Why aren't exploration techniques, such as UCB or Thompson sampling, used in full RL problems?

Why aren't exploration techniques, such as UCB or Thompson sampling, typically used in bandit problems, used in full RL problems? Monte Carlo Tree Search may use the above-mentioned methods in its selection step, but why do value-based and policy…

reinforcement-learning multi-armed-bandits upper-confidence-bound thompson-sampling exploration-strategies

asked Nov 15 '20 at 19:10

Mika

331
1
8

2

votes

0 answers

Minimum sampling for maximising the prediction accuracy

Suppose that I'm training a machine learning model to predict people's age by a picture of their faces. Lets say that I have a dataset of people from 1 year olds to 100 year olds. But I want to choose just 9 (arbitrary) ages out of this 100 age…

bayesian-optimization thompson-sampling sampling

asked Aug 22 '22 at 13:02

noone

123
4

1

vote

0 answers

Data Imbalance in Contextual Bandit with Thompson Sampling

I'm working with the Online Logistic Regression Algorithm (Algorithm 3) of Chapelle and Li in their paper, "An Empirical Evaluation of Thompson Sampling" (https://papers.nips.cc/paper/2011/file/e53a0a2978c28872a4505bdb51db06dc-Paper.pdf). It's a…

multi-armed-bandits logistic-regression imbalanced-datasets thompson-sampling

asked Jul 01 '22 at 17:44

MABQ

11
1

1

vote

0 answers

Multi-armed bandits: reducing stochastic multi-armed bandits to bernoulli bandits

Agrawal and Goyal (http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf page 3) discussed how we can extend Thompson sampling for bernoulli bandits to Thompson sampling for stochastic bandits in general by simply Bernoulli sampling with the…

reinforcement-learning multi-armed-bandits thompson-sampling

asked Nov 08 '20 at 14:20

Felix P.

287
1
6

1

vote

1 answer

Why am I getting better performance with Thompson sampling than with UCB or $\epsilon$-greedy in a multi-armed bandit problem?

I ran a test using 3 strategies for multi-armed bandit: UCB, $\epsilon$-greedy, and Thompson sampling. The results for the rewards I got are as follows: Thompson sampling had the highest average reward UCB was second $\epsilon$-greedy was third,…

reinforcement-learning multi-armed-bandits thompson-sampling upper-confidence-bound epsilon-greedy-policy

asked Jun 15 '20 at 22:08

Java coder

11
1
2

0

votes

0 answers

How to use UCB or TS in linear programming?

Consider a sequential decision-making problem over $T$ periods where the parameters of the problem should be learned and also optimize an objective function. One possibility is to model the problem as a dynamic program and use RL techniques to solve…

dynamic-programming upper-confidence-bound thompson-sampling linear-programming

asked Feb 27 '23 at 05:09

Amin

471
2
11

0

votes

0 answers

Thompson sampling, is it accurate for smaller sample sizes?

For example are 500 samples enough? I tried this code import numpy as np import matplotlib.pyplot as plt import pandas as pd dataset = pd.read_csv('Ads_CTR_Optimisation.csv') # Implementing Thompson Sampling import random N = 499 d =…

machine-learning python thompson-sampling

asked Sep 23 '22 at 11:37

Alex

1

0

votes

0 answers

Is there a variant of Thompson Sampling that works with variable bandits?

Does there exist a variant of TS, such that, while computing the returns of multi-armed bandits, we have the possibility of introducing an extra bandit? For instance, while we are applying TS to 3 slot machines, we come to know about the existence…

reinforcement-learning multi-armed-bandits thompson-sampling

asked Mar 10 '22 at 23:16

desert_ranger

586
3
19

0

votes

1 answer

Why is Thompson Sampling considered a part of Reinforcement Learning?

I often see Thompson Sampling in RL literature, however, I am not able to relate it to any of the current RL techniques. How exactly does it fit with RL?

reinforcement-learning thompson-sampling

asked Dec 05 '21 at 16:43

desert_ranger

586
3
19

0

votes

0 answers

Thompson sampling with Bernoulli prior and non-binary reward update

I am solving a problem for which I have to select the best possible servers (level 1) to hit for a given data. These servers (level 1) in turn hit some other servers (level 2) to complete the request. The level 1 servers have the same set of level 2…

reinforcement-learning reward-design reward-functions bayesian-statistics thompson-sampling

asked Nov 16 '20 at 07:56

PUNEET AGARWAL

1

Questions tagged [thompson-sampling]