Highest Voted 'value-iteration' Questions - Artificial Intelligence Stack Exchange

7

votes

2 answers

In Value Iteration, why can we initialize the value function arbitrarily?

I have not been able to find a good explanation of this, other than statements that the algorithm is guaranteed to converge with arbitrary choices for initial values in each state. Is this something to do with the Bellman optimality constraint…

asked Jun 05 '22 at 16:05

Arham

73
3

5

votes

1 answer

Should the reward or the Q value be clipped for reinforcement learning

When extending reinforcement learning to the continuous states, continuous action case, we must use function approximators (linear or non-linear) to approximate the Q-value. It is well known that non-linear function approximators, such as neural…

machine-learning reinforcement-learning value-iteration reward-clipping

asked Oct 10 '18 at 23:45

Rui Nian

423
3
13

5

votes

1 answer

How is the fitted Q-iteration algorithm related to $Q^*(s, a)$, and how can we use function approximation with this algorithm?

I hope to get some clarifications on Fitted Q-Iteration (FQI). My Research So Far I've read Sutton's book (specifically, ch 6 to 10), Ernst et al and this paper. I know that $Q^*(s, a)$ expresses the expected value of first taking action $a$ from…

reinforcement-learning q-learning papers value-iteration

asked Aug 07 '18 at 20:00

NoviceProg

153
4

5

votes

0 answers

What exactly is non-delusional Q-learning?

Problems occur when we combine Q-learning with a function approximator. What exactly is the delusional-bias and non-delusional Q-learning? I am talking about the neurIPS 18 best paper Non-delusional Q-learning and value-iteration. I have trouble…

reinforcement-learning q-learning papers value-iteration policy-iteration

asked Oct 15 '22 at 20:38

wrek

183
4

5

votes

2 answers

Why are policy iteration and value iteration studied as separate algorithms?

In Sutton and Barto's book about reinforcement learning, policy iteration and value iterations are presented as separate/different algorithms. This is very confusing because policy iteration includes an update/change of value and value iteration…

reinforcement-learning comparison value-iteration policy-iteration

asked Aug 13 '20 at 13:31

User007

51
3

5

votes

1 answer

Why is my implementation of Q-learning not converging to the right values in the FrozenLake environment?

I am trying to learn tabular Q learning by using a table of states and actions (i.e. no neural networks). I was trying it out on the FrozenLake environment. It's a very simple environment, where the task is to reach a G starting from a source S…

reinforcement-learning q-learning value-iteration policy-iteration frozen-lake

asked Oct 14 '19 at 16:52

abkds

191
1
6

4

votes

1 answer

What should the discount factor for the non-slippery version of the FrozenLake environment be?

I was working with FrozenLake 4x4 from open AI gym. In the slippery case, using a discounting factor of 1, my value iteration implementation was giving a success rate of around 75 percent. It was much worse for the 8x8 grid with success around 50%.…

reinforcement-learning sutton-barto value-iteration discount-factor frozen-lake

asked May 20 '22 at 06:39

ketan dhanuka

67
4

4

votes

1 answer

Why doesn't value iteration use $\pi(a \mid s)$ while policy evaluation does?

I was looking at the Bellman equation, and I noticed a difference between the equations used in policy evaluation and value iteration. In policy evaluation, there was the presence of $\pi(a \mid s)$, which indicates the probability of choosing…

reinforcement-learning policies value-iteration policy-iteration bellman-equations

asked Aug 25 '20 at 12:35

Chukwudi Ogbonna

125
4

4

votes

1 answer

Why can't we apply value iteration when we do not know the reward and transition functions, and how does Q-learning solve this issue?

I don't understand why we can't apply value iteration when don't know the reward and transition probabilities. In this lecture, the lecturer says it has to do with not being able to take max with samples, but what does this mean? Why does Q-learning…

reinforcement-learning q-learning markov-decision-process value-iteration

asked Oct 30 '16 at 23:56

Abhishek Bhatia

427
2
5
15

4

votes

2 answers

Would you categorize policy iteration as an actor-critic reinforcement learning approach?

One way of understanding the difference between value function approaches, policy approaches and actor-critic approaches in reinforcement learning is the following: A critic explicitly models a value function for a policy. An actor explicitly…

reinforcement-learning definitions actor-critic-methods value-iteration policy-iteration

asked May 13 '20 at 10:32

dan888

91
4

4

votes

1 answer

Understanding the update rule for the policy in the policy iteration algorithm

Consider the grid world problem in RL. Formally, policy in RL is defined as $\pi(a|s)$. If we are solving grid world by policy iteration then the following pseudocode is used: My question is related to the policy improvement step. Specifically, I…

reinforcement-learning value-iteration policy-iteration policy-improvement

asked May 12 '19 at 11:15

user9947

4

votes

1 answer

A few questions regarding the difference between policy iteration and value iteration

The question already has some answer. But I am still finding it quite unclear (also does $\pi(s)$ here mean $q(s,a)$ ?): The few things I do not understand are: Why the difference between 2 iterations if we are acting greedily in each of them? As…

reinforcement-learning policies value-iteration

asked Mar 06 '19 at 14:44

user9947

3

votes

1 answer

What is the time complexity of the value iteration algorithm?

Recently, I have come across the information (lecture 8 and 9 about MDPs of this UC Berkeley AI course) that the time complexity for each iteration of the value iteration algorithm is $\mathcal{O}(|S|^{2}|A|)$, where $|S|$ is the number of states…

reinforcement-learning algorithm time-complexity value-iteration

asked Nov 17 '18 at 13:46

Shifat E Arman

83
1
5

3

votes

1 answer

Are policy and value iteration used only in grid world like scenarios?

I am trying to self learn reinforcement learning. At the moment I am focusing on policy and value iteration, and I am finding several problems and doubts. One of the main doubts is given by the fact that I can't find many diversified examples on how…

reinforcement-learning value-iteration policy-iteration dynamic-programming

asked Jun 15 '21 at 10:39

dcr

57
5

3

votes

2 answers

What is the value of a state when there is a certain probability that agent will die after each step?

We assume infinite horizon and discount factor $\gamma = 1$. At each step, after the agent takes an action and gets its reward, there is a probability $\alpha = 0.2$, that agent will die. The assumed maze looks like this Possible actions are go…

reinforcement-learning markov-decision-process value-functions value-iteration discount-factor

asked Jun 13 '20 at 16:10

Milan Mitterko

33
3

Questions tagged [value-iteration]