Highest Voted 'optimal-policy' Questions - Artificial Intelligence Stack Exchange

5

votes

2 answers

Given two optimal policies, is an affine combination of them also optimal?

If there are two different optimal policies $\pi_1, \pi_2$ in a reinforcement learning task, will the linear combination (or affine combination) of the two policies $\alpha \pi_1 + \beta \pi_2, \alpha + \beta = 1$ also be an optimal policy? Here I…

asked Nov 18 '20 at 07:04

yang liu

53
3

5

votes

1 answer

What's the optimal policy in the rock-paper-scissors game?

A deterministic policy in the rock-paper-scissors game can be easily exploited by the opponent - by doing just the right sequence of moves to defeat the agent. More often than not, I've heard that a random policy is the optimal policy in this case -…

reinforcement-learning game-theory optimal-policy

asked Aug 27 '20 at 10:22

stoic-santiago

1,121
5
18

4

votes

1 answer

An example of a unique value function which is associated with multiple optimal policies

In the 4th paragraph of http://www.incompleteideas.net/book/ebook/node37.html it is mentioned: Whereas the optimal value functions for states and state-action pairs are unique for a given MDP, there can be many optimal policies Could you please…

reinforcement-learning policies value-functions optimal-policy

asked Aug 20 '18 at 09:01

Melanie A

143
2

3

votes

1 answer

What is the difference between a greedy policy and an optimal policy?

I am struggling to understand what is the difference between an optimal policy and a greedy policy. Let $F(r_{t+1},s_{t+1}| s_t,a_t)$ be the probability distribution accorting to which, given action $a_t$ in state $s_t$, reward $r_{t+1}$ realizes…

reinforcement-learning comparison value-functions bellman-equations optimal-policy

asked Mar 06 '22 at 09:07

fennel

33
5

3

votes

1 answer

Can an optimal policy have a value function that has a smaller value for a state than a non-optimal policy?

I'm starting to learn about the Bellman Equation and a question came to my mind. A policy $\pi$ is optimal if the value $v_\pi(s)$ is greater or equal than the value $v_{\pi'}(s)$ for all states $s \in S$. Why does this work? Can't it be that the…

reinforcement-learning value-functions policies bellman-equations optimal-policy

asked Nov 08 '21 at 14:15

raphael_mav

133
4

3

votes

1 answer

How is $v_*(s) = \max_{\pi} v_\pi(s)$ also applicable in the case of stochastic policies?

I am reading Sutton & Bartos's Book "Introduction to reinforcement learning". In this book, the defined the optimal value function as: $$v_*(s) = \max_{\pi} v_\pi(s),$$ for all $s \in \mathcal{S}$. Do we take the max over all deterministic policies,…

markov-decision-process value-functions stochastic-policy optimal-policy optimality

asked Mar 26 '21 at 08:15

Tamar

33
3

2

votes

1 answer

In which community does using a Bayesian regression model as a reward function with exploration vs. exploitation challenges fall under?

I am trying to find research papers addressing a problem that, in my opinion, deserves significant attention. However, I am having difficulty locating relevant information. To illustrate the problem at hand, consider a multivariate Bayesian…

reinforcement-learning exploration-exploitation-tradeoff bayesian-optimization optimal-policy

asked Jun 26 '23 at 08:14

paul

33
5

2

votes

1 answer

What does $v(S_{t+1})$ mean in the optimal state-action value function?

In Sutton & Barto's Reinforcement Learning: An Introduction page 63 the authors introduce the optimal state value function in the expression of the optimal action-value function as follows: $q_{*}(s,a)=\mathbb{E}[R_{t+1}+\gamma…

reinforcement-learning optimal-policy

asked Jun 14 '21 at 19:26

Daviiid

563
3
15

1

vote

2 answers

Why is the optimal policy for an infinite horizon MDP deterministic?

Could someone please help me gain some intuition as to why the optimal policy for a Markov Decision Process in the infinite horizon case (agent acts forever) is deterministic?

reinforcement-learning markov-decision-process policies optimal-policy optimality

asked Aug 06 '20 at 07:42

stoic-santiago

1,121
5
18

0

votes

0 answers

What reward should be selected for transition states to make the agent reach the terminal state (destination) faster? negative, positive, or zero?

Consider the simple environment below, where the gray cells are the terminal states and the agent receives a reward of $-5$ for taking any action in these states. The nonterminal states are $S = \{1, 2, . . . , 14\}$. There are four actions possible…

reinforcement-learning markov-decision-process optimal-policy

asked Oct 26 '22 at 16:42

jigz

11
2

0

votes

1 answer

How is policy iteration capable of improving on a deterministic policy?

Given a policy $\pi$ and the improved version upon it using policy iteration $\pi'$ we have, for $\forall s \in S$, $v_{\pi'}(s)\geq v_{\pi}(s)$. I think the way we choose $\pi'$ makes it deterministic (unless there is a tie but let's not consider…

reinforcement-learning value-functions policy-iteration optimal-policy deterministic-policy

asked Apr 25 '22 at 19:52

Daviiid

563
3
15

0

votes

0 answers

Determine Gridworld values

I am learning Reinforcement learning for games following Gridworld examples. Apologies in advance if this is a basic question, very new to reinforcement learning. I am slightly confused in scenarios where probability of moving up, down, left and…

reinforcement-learning deep-learning deep-rl bellman-equations optimal-policy

asked Apr 14 '22 at 07:46

Krellex

145
4

Questions tagged [optimal-policy]