Questions tagged [optimal-policy]

For questions related to the concept of "optimal policy" in reinforcement learning.

12 questions
5
votes
2 answers

Given two optimal policies, is an affine combination of them also optimal?

If there are two different optimal policies $\pi_1, \pi_2$ in a reinforcement learning task, will the linear combination (or affine combination) of the two policies $\alpha \pi_1 + \beta \pi_2, \alpha + \beta = 1$ also be an optimal policy? Here I…
5
votes
1 answer

What's the optimal policy in the rock-paper-scissors game?

A deterministic policy in the rock-paper-scissors game can be easily exploited by the opponent - by doing just the right sequence of moves to defeat the agent. More often than not, I've heard that a random policy is the optimal policy in this case -…
4
votes
1 answer

An example of a unique value function which is associated with multiple optimal policies

In the 4th paragraph of http://www.incompleteideas.net/book/ebook/node37.html it is mentioned: Whereas the optimal value functions for states and state-action pairs are unique for a given MDP, there can be many optimal policies Could you please…
3
votes
1 answer

What is the difference between a greedy policy and an optimal policy?

I am struggling to understand what is the difference between an optimal policy and a greedy policy. Let $F(r_{t+1},s_{t+1}| s_t,a_t)$ be the probability distribution accorting to which, given action $a_t$ in state $s_t$, reward $r_{t+1}$ realizes…
3
votes
1 answer

Can an optimal policy have a value function that has a smaller value for a state than a non-optimal policy?

I'm starting to learn about the Bellman Equation and a question came to my mind. A policy $\pi$ is optimal if the value $v_\pi(s)$ is greater or equal than the value $v_{\pi'}(s)$ for all states $s \in S$. Why does this work? Can't it be that the…
3
votes
1 answer

How is $v_*(s) = \max_{\pi} v_\pi(s)$ also applicable in the case of stochastic policies?

I am reading Sutton & Bartos's Book "Introduction to reinforcement learning". In this book, the defined the optimal value function as: $$v_*(s) = \max_{\pi} v_\pi(s),$$ for all $s \in \mathcal{S}$. Do we take the max over all deterministic policies,…
2
votes
1 answer

In which community does using a Bayesian regression model as a reward function with exploration vs. exploitation challenges fall under?

I am trying to find research papers addressing a problem that, in my opinion, deserves significant attention. However, I am having difficulty locating relevant information. To illustrate the problem at hand, consider a multivariate Bayesian…
2
votes
1 answer

What does $v(S_{t+1})$ mean in the optimal state-action value function?

In Sutton & Barto's Reinforcement Learning: An Introduction page 63 the authors introduce the optimal state value function in the expression of the optimal action-value function as follows: $q_{*}(s,a)=\mathbb{E}[R_{t+1}+\gamma…
Daviiid
  • 563
  • 3
  • 15
1
vote
2 answers

Why is the optimal policy for an infinite horizon MDP deterministic?

Could someone please help me gain some intuition as to why the optimal policy for a Markov Decision Process in the infinite horizon case (agent acts forever) is deterministic?
0
votes
0 answers

What reward should be selected for transition states to make the agent reach the terminal state (destination) faster? negative, positive, or zero?

Consider the simple environment below, where the gray cells are the terminal states and the agent receives a reward of $-5$ for taking any action in these states. The nonterminal states are $S = \{1, 2, . . . , 14\}$. There are four actions possible…
0
votes
1 answer

How is policy iteration capable of improving on a deterministic policy?

Given a policy $\pi$ and the improved version upon it using policy iteration $\pi'$ we have, for $\forall s \in S$, $v_{\pi'}(s)\geq v_{\pi}(s)$. I think the way we choose $\pi'$ makes it deterministic (unless there is a tie but let's not consider…
0
votes
0 answers

Determine Gridworld values

I am learning Reinforcement learning for games following Gridworld examples. Apologies in advance if this is a basic question, very new to reinforcement learning. I am slightly confused in scenarios where probability of moving up, down, left and…