Highest Voted 'policy-iteration' Questions - Artificial Intelligence Stack Exchange

9

votes

2 answers

Why are the Bellman operators contractions?

In these slides, it is written \begin{align} \left\|T^{\pi} V-T^{\pi} U\right\|_{\infty} & \leq \gamma\|V-U\|_{\infty} \tag{9} \label{9} \\ \|T V-T U\|_{\infty} & \leq \gamma\|V-U\|_{\infty} \tag{10} \label{10} \end{align} where $F$ is the space of…

asked Jul 31 '20 at 02:48

kevin

191
1
4

5

votes

0 answers

What exactly is non-delusional Q-learning?

Problems occur when we combine Q-learning with a function approximator. What exactly is the delusional-bias and non-delusional Q-learning? I am talking about the neurIPS 18 best paper Non-delusional Q-learning and value-iteration. I have trouble…

reinforcement-learning q-learning papers value-iteration policy-iteration

asked Oct 15 '22 at 20:38

wrek

183
4

5

votes

2 answers

When to use Value Iteration vs. Policy Iteration

Both value iteration and policy iteration are General Policy Iteration (GPI) algorithms. However, they differ in the mechanics of their updates. Policy Iteration seeks to first find a completed value function for a policy, then derive the Q…

reinforcement-learning comparison q-learning value-functions policy-iteration

asked Oct 11 '21 at 19:36

SeeDerekEngineer

521
4
11

5

votes

1 answer

Does the policy iteration convergence hold for finite-horizon MDP?

Most RL books (Sutton & Barto, Bertsekas, etc.) talk about policy iteration for infinite-horizon MDPs. Does the policy iteration convergence hold for finite-horizon MDP? If yes, how can we derive the algorithm?

reinforcement-learning markov-decision-process convergence policy-iteration

asked Apr 14 '21 at 03:37

user529295

359
1
10

5

votes

2 answers

Why are policy iteration and value iteration studied as separate algorithms?

In Sutton and Barto's book about reinforcement learning, policy iteration and value iterations are presented as separate/different algorithms. This is very confusing because policy iteration includes an update/change of value and value iteration…

reinforcement-learning comparison value-iteration policy-iteration

asked Aug 13 '20 at 13:31

User007

51
3

5

votes

2 answers

How can the policy iteration algorithm be model-free if it uses the transition probabilities?

I'm actually trying to understand the policy iteration in the context of RL. I read an article presenting it and, at some point, a pseudo-code of the algorithm is given : What I can't understand is this line : From what I understand, policy…

reinforcement-learning comparison model-based-methods model-free-methods policy-iteration

asked Mar 11 '20 at 16:11

Samuel Beaussant

183
3

5

votes

1 answer

Why is my implementation of Q-learning not converging to the right values in the FrozenLake environment?

I am trying to learn tabular Q learning by using a table of states and actions (i.e. no neural networks). I was trying it out on the FrozenLake environment. It's a very simple environment, where the task is to reach a G starting from a source S…

reinforcement-learning q-learning value-iteration policy-iteration frozen-lake

asked Oct 14 '19 at 16:52

abkds

191
1
6

4

votes

1 answer

Why doesn't value iteration use $\pi(a \mid s)$ while policy evaluation does?

I was looking at the Bellman equation, and I noticed a difference between the equations used in policy evaluation and value iteration. In policy evaluation, there was the presence of $\pi(a \mid s)$, which indicates the probability of choosing…

reinforcement-learning policies value-iteration policy-iteration bellman-equations

asked Aug 25 '20 at 12:35

Chukwudi Ogbonna

125
4

4

votes

1 answer

Why is update rule of the value function different in policy evaluation and policy iteration?

In the textbook "Reinforcement Learning: An Introduction", by Richard Sutton and Andrew Barto, the pseudo code for Policy Evaluation is given as follows: The update equation for $V(s)$ comes from the Bellman equation for $v_{\pi}(s)$ which is…

reinforcement-learning comparison policy-iteration policy-evaluation dynamic-programming

asked May 19 '20 at 06:08

Nishanth Rao

147
6

4

votes

2 answers

Would you categorize policy iteration as an actor-critic reinforcement learning approach?

One way of understanding the difference between value function approaches, policy approaches and actor-critic approaches in reinforcement learning is the following: A critic explicitly models a value function for a policy. An actor explicitly…

reinforcement-learning definitions actor-critic-methods value-iteration policy-iteration

asked May 13 '20 at 10:32

dan888

91
4

4

votes

1 answer

Understanding the update rule for the policy in the policy iteration algorithm

Consider the grid world problem in RL. Formally, policy in RL is defined as $\pi(a|s)$. If we are solving grid world by policy iteration then the following pseudocode is used: My question is related to the policy improvement step. Specifically, I…

reinforcement-learning value-iteration policy-iteration policy-improvement

asked May 12 '19 at 11:15

user9947

3

votes

1 answer

Are policy and value iteration used only in grid world like scenarios?

I am trying to self learn reinforcement learning. At the moment I am focusing on policy and value iteration, and I am finding several problems and doubts. One of the main doubts is given by the fact that I can't find many diversified examples on how…

reinforcement-learning value-iteration policy-iteration dynamic-programming

asked Jun 15 '21 at 10:39

dcr

57
5

3

votes

0 answers

What are some strong algorithms for Perfect Information, Deterministic Multiplayer Games?

I have a series of games with the following properties: 3 or more players, but purely non-cooperative (i.e., no coalition forming); sequential moves; perfect information; deterministic state transitions and rewards; and game size is large enough to…

game-theory combinatorial-games policy-iteration

asked Apr 08 '21 at 15:43

MikeRand

131
3

3

votes

1 answer

What is generalized policy iteration?

I am reading Sutton and Barto's material now. I know value iteration, which is an iterative algorithm taking the maximum value of adjacent states, and policy iteration. But what is generalized policy iteration?

reinforcement-learning definitions value-iteration policy-iteration

asked Apr 25 '20 at 16:17

dato nefaridze

862
6
20

3

votes

2 answers

Why do value iteration and policy iteration obtain similar policies even though they have different value functions?

I am trying to implement value and policy iteration algorithms. My value function from policy iteration looks vastly different from the values from value iteration, but the policy obtained from both is very similar. How is this possible? And what…

reinforcement-learning comparison policies value-iteration policy-iteration

asked Apr 21 '20 at 20:03

r4bb1t

305
1
2
8

Questions tagged [policy-iteration]