Highest Voted 'contextual-bandits' Questions - Artificial Intelligence Stack Exchange

8

votes

2 answers

What is the relation between the context in contextual bandits and the state in reinforcement learning?

Conceptually, in general, how is the context being handled in contextual bandits (CB), compared to states in reinforcement learning (RL)? Specifically, in RL, we can use a function approximator (e.g. a neural network) to generalize to other states.…

asked Feb 11 '21 at 04:10

Maxim Volgin

183
2
8

5

votes

1 answer

Can you convert a MDP problem to a Contextual Multi-Arm Bandits problem?

I'm trying to get a better understanding of Multi-Arm Bandits, Contextual Multi-Arm Bandits and Markov Decision Process. Basically, Multi-Arm Bandits is a special case of Contextual Multi-Arm Bandits where there is no state(features/context). And…

reinforcement-learning comparison markov-decision-process multi-armed-bandits contextual-bandits

asked Aug 17 '20 at 03:17

peidaqi

151
1

5

votes

2 answers

Are bandits considered an RL approach?

If a research paper uses multi-armed bandits (either in their standard or contextual form) to solve a particular task, can we say that they solved this task using a reinforcement learning approach? Or should we distinguish between the two and use…

reinforcement-learning terminology multi-armed-bandits contextual-bandits

asked May 02 '20 at 14:42

user5093249

722
4
8

3

votes

1 answer

How to implement a contextual reinforcement learning model?

In a reinforcement learning model, states depend on the previous actions chosen. In the case in which some of the states -but not all- are fully independent of the actions -but still obviously determine the optimal actions-, how could we take these…

reinforcement-learning multi-armed-bandits contextual-bandits

asked Jun 12 '18 at 14:02

freesoul

246
1
5

3

votes

1 answer

Can I apply DQN or policy gradient algorithms in the contextual bandit setting?

I have a problem which I believe can be described as a contextual bandit. More specifically, in each round, I observe a context from the environment consisting of five continuous features, and, based on the context, I have to choose one of the ten…

reinforcement-learning dqn actor-critic-methods reinforce contextual-bandits

asked Jun 06 '20 at 17:48

gnikol

175
7

2

votes

0 answers

Is it better to model a Contextual Multi-Armed Bandit problem as an MDP with a non-zero discount factor than treating it as it is?

I'd like to ask if it is, generally, better to model a problem that naturally appears as a Contextual Multi-Armed Bandit like Recommender Systems as a Markov Decision Process with a non-zero discount factor (otherwise it's just an MDP with one step…

reinforcement-learning markov-decision-process multi-armed-bandits contextual-bandits discount-factor

asked Jul 29 '21 at 20:07

Daviiid

563
3
15

2

votes

0 answers

Is there a UCB type algorithm for linear stochastic bandit with lasso regression?

Why is there no upper confidence bound algorithm for linear stochastic bandits that uses lasso regression in the case that the regression parameters are sparse in the features? In particular, I don't understand what is hard about lasso regression…

machine-learning reinforcement-learning linear-regression multi-armed-bandits contextual-bandits

asked Sep 20 '20 at 00:48

PJORR

21
2

1

vote

1 answer

Why is it useful in some applications to use features that are shared by all arms?

In Li et al. (2010)'s highly cited paper, they talk about LinUCB with hybrid linear models in Section 3.2. They motivate this by saying In many applications including ours, it is helpful to use features that are shared by all arms, in addition to…

papers multi-armed-bandits contextual-bandits

asked May 08 '18 at 14:26

wwl

153
5

1

vote

1 answer

How can I incorporate domain knowledge to choose actions in the case of large action spaces in multi-armed bandits?

Suppose one is using a multi-armed bandit, and one has relatively few "pulls" (i.e. timesteps) relative to the action set. For example, maybe there are 200 timesteps and 100 possible actions. However, you do have information on how similar actions…

reinforcement-learning multi-armed-bandits action-spaces contextual-bandits

asked Apr 21 '18 at 13:51

wwl

153
5

1

vote

0 answers

Name of a multiarmed bandit with only some levers available

In order to model a card game, as an exercise, I was thinking of an elementary setting as a multiarmed bandit, each lever being the distribution of expected rewards of a specific card. But, of course, the player only has some cards in the hand each…

reinforcement-learning multi-armed-bandits contextual-bandits

asked Mar 02 '18 at 13:35

arivero

51
7

1

vote

0 answers

Policy gradient (or more general, RL algorithms) for the problems where actions does not determine next state (next state is independent to action)

I am pretty new in RL. Could anyone suggest results/paper about whether or not policy gradient (or more general RL algorithms) can be applied to the problems where actions does not determine next state? e.g. next state is independent to action…

reinforcement-learning deep-rl policy-gradients multi-armed-bandits contextual-bandits

asked May 30 '22 at 20:32

Penn

11
2

1

vote

1 answer

How to handled delayed rewards in contextual bandits

All the examples I see in the tf_Agents for contextual bandits, involves a reward function we generated the reward instantly after an observation has been generated. But, in my real world usecase (say sending emails and waiting for the click rate),…

reinforcement-learning tensorflow experience-replay contextual-bandits

asked May 02 '22 at 17:32

tjt

111
3

1

vote

0 answers

Multi-armed Bandit in optimization on graph edges selection

I have the problem, which I described below. I wonder if there exists a class of multi-armed bandit approaches that is related to it. I am working on computer networking optimization. In the simplest scenario, we model the network as a graph with a…

reinforcement-learning q-learning optimization multi-armed-bandits contextual-bandits

asked Oct 31 '21 at 12:53

Ramon

21
1

1

vote

1 answer

Why do I get bad results no matter my neural network function approximator for parametrized Q-learning implementation for Contextual Bandits?

I'd like to ask you why, no matter my neural network function approximator for parametrized Q-learning implementation for a Contextual Bandits environment, I'm getting bad results. I don't know if it's a problem with my formulation of the problem…

reinforcement-learning q-learning dqn multi-armed-bandits contextual-bandits

asked Jul 08 '21 at 11:45

Daviiid

563
3
15

1

vote

0 answers

(explore-exploit + supervised learning ) vs contextual bandits

Lets take an ad recommendation problem for 1 slot. Feedback is click/no click. I can solve this by contextual bandits. But I can also introduce exploration in supervised learning, I learn my model from collected data every k hours. What can…

supervised-learning contextual-bandits exploration-strategies

asked Jun 29 '21 at 08:58

dksahuji

111
2

Questions tagged [contextual-bandits]