Highest Voted 'policy-evaluation' Questions - Artificial Intelligence Stack Exchange

20

votes

2 answers

What is the difference between First-Visit Monte-Carlo and Every-Visit Monte-Carlo Policy Evaluation?

I came across these 2 algorithms, but I cannot understand the difference between these 2, both in terms of implementation as well as intuitionally. So, what difference does the second point in both the slides refer to?

asked Feb 22 '19 at 09:28

user9947

8

votes

2 answers

What is the proof that policy evaluation converges to the optimal solution?

Although I know how the algorithm of iterative policy evaluation using dynamic programming works, I am having a hard time realizing how it actually converges. It appeals to intuition that, with each iteration, we get a better and better…

reinforcement-learning reference-request proofs bellman-equations policy-evaluation

asked Apr 16 '20 at 06:44

SAGALPREET SINGH

147
1
6

4

votes

1 answer

Why is update rule of the value function different in policy evaluation and policy iteration?

In the textbook "Reinforcement Learning: An Introduction", by Richard Sutton and Andrew Barto, the pseudo code for Policy Evaluation is given as follows: The update equation for $V(s)$ comes from the Bellman equation for $v_{\pi}(s)$ which is…

reinforcement-learning comparison policy-iteration policy-evaluation dynamic-programming

asked May 19 '20 at 06:08

Nishanth Rao

147
6

4

votes

3 answers

Why can the Bellman equation be turned into an update rule?

In chapter 4.1 of Sutton's book, the Bellman equation is turned into an update rule by simply changing the indices of it. How is it mathematically justified? I didn't quite get the initiation of why we are allowed to do that? $$v_{\pi}(s) = \mathbb…

reinforcement-learning implementation convergence bellman-equations policy-evaluation

asked Apr 10 '20 at 22:07

Saeid Ghafouri

113
5

4

votes

1 answer

How does policy evaluation work for continuous state space model-free approaches?

How does policy evaluation work for continuous state space model-free approaches? Theoretically, a model-based approach for the discrete state and action space can be computed via dynamic programming and solving the Bellman equation. Let's say you…

reinforcement-learning deep-rl monte-carlo-methods model-free-methods policy-evaluation

asked Feb 19 '20 at 02:26

calveeen

1,251
7
17

2

votes

1 answer

Is the existence and uniqueness of the state-value function for $\gamma < 1$ theoretical?

Consider the following statement from 4.1 Policy Evaluation of the first edition of Sutton and Barto's book. The existence and uniqueness of $V^{\pi}$ are guaranteed as long as either $\gamma < 1$ or eventual termination is guaranteed from…

reinforcement-learning proofs value-functions sutton-barto policy-evaluation

asked May 12 '21 at 10:32

hanugm

3,571
3
18
50

2

votes

1 answer

Can we use Q-learning update for policy evaluation (not control)?

For policy evaluation purposes, can we use the Q-learning algorithm even though, technically, it is meant for control? Maybe like this: Have the policy to be evaluated as the behaviour policy. Update the Q value conventionally (i.e. updating…

reinforcement-learning q-learning policy-evaluation

asked Nov 15 '20 at 18:36

Dhruv Mullick

123
4

2

votes

1 answer

Why do we need to go back to policy evaluation after policy improvement if the policy is not stable?

Above is the algorithm for Policy Iteration from Sutton's RL book. So, step 2 actually looks like value iteration, and then, at step 3 (policy improvement), if the policy isn't stable it goes back to step 2. I don't really understand this: it seems…

reinforcement-learning value-iteration policy-iteration policy-evaluation policy-improvement

asked Sep 06 '20 at 03:45

user8714896

717
1
4
21

2

votes

1 answer

Is value iteration stopped after one update of each state?

In section 4.4 Value Iteration, the authors write One important special case is when policy evaluation is stopped after just one sweep (one update of each state). This algorithm is called value iteration. After that, they provide the following…

reinforcement-learning value-iteration policy-evaluation pseudocode policy-improvement

asked Aug 13 '20 at 20:00

Alex

23
3

2

votes

1 answer

How can I implement policy evaluation when reward is tied to an action outcome?

I'm following Stanford reinforcement learning videos on youtube. One of the assignments asks to write code for policy evaluation for Gym's FrozenLake-v0 environment. In the course (and books I have seen), they define policy evaluation…

reinforcement-learning math implementation gym policy-evaluation

asked Apr 13 '20 at 13:01

Argod

23
2

2

votes

0 answers

Difficulty understanding Monte Carlo policy evaluation (state-value) for gridworld

I've been trying to read Sutton & Barto book chapter 5.1, but I'm still a bit confused about the procedure of using Monte Carlo policy evaluation (p.92), and now I just cant proceed anymore coding a python solution, because I feel like I don't fully…

reinforcement-learning monte-carlo-methods policy-evaluation

asked Apr 12 '19 at 17:06

Late347

59
4

1

vote

1 answer

Why is the update in-place faster than the out-of-place one in dynamic programming?

In Barto and Sutton's book, it's written that we have two types of updates in dynamic programming Update out-of-place Update in-place The update in-place is the faster one. Why is that the case? This is the pseudocode that I used to test it. if…

reinforcement-learning sutton-barto policy-evaluation dynamic-programming

asked Feb 02 '21 at 20:20

VanasisB

13
3

1

vote

1 answer

Why isn't the implementation of my policy evaluation for a simple MDP converging?

I am trying to code out a policy evaluation algorithm to find the $V^\pi(s)$ for all states. The following diagram below shows the MDP. In this case i let p = q = 0.5. the rewards for each states are independent of action. I.e $r(\sigma_0)$ =…

reinforcement-learning markov-decision-process implementation convergence policy-evaluation

asked Mar 14 '20 at 02:41

calveeen

1,251
7
17

-1

votes

1 answer

Using states (features) and actions from a heuristic model to estimate the value function of a reinforcement learning agent

new to RL here. As far as i understood from RL courses, that there is two sides of reinforcement learning. Policy Evaluation, which is the task of knowing the value function for certain policy. and Control, which is maximizing the reward or the…

reinforcement-learning deep-rl dqn control-problem policy-evaluation

asked Apr 18 '21 at 14:52

Ramzy

3
5

Questions tagged [policy-evaluation]