Highest Voted 'value-functions' Questions - Artificial Intelligence Stack Exchange

10

votes

1 answer

What is the difference between expected return and value function?

I've seen numerous mathematical explanations of reward, value functions $V(s)$, and return functions. The reward provides an immediate return for being in a specific state. The better the reward, the better the state. As I understand it, it can be…

asked Mar 17 '18 at 17:00

user3168961

221
2
6

7

votes

2 answers

In Value Iteration, why can we initialize the value function arbitrarily?

I have not been able to find a good explanation of this, other than statements that the algorithm is guaranteed to converge with arbitrary choices for initial values in each state. Is this something to do with the Bellman optimality constraint…

reinforcement-learning machine-learning value-functions sutton-barto value-iteration

asked Jun 05 '22 at 16:05

Arham

73
3

7

votes

2 answers

Why does the state-action value function, defined as an expected value of the reward and state value function, not need to follow a policy?

I often see that the state-action value function is expressed as: $$q_{\pi}(s,a)=\color{red}{\mathbb{E}_{\pi}}[R_{t+1}+\gamma G_{t+1} | S_t=s, A_t = a] = \color{blue}{\mathbb{E}}[R_{t+1}+\gamma v_{\pi}(s') |S_t = s, A_t =a]$$ Why does expressing the…

reinforcement-learning value-functions bellman-equations expectation

asked Jun 06 '20 at 08:55

Daniel Wiczew

323
2
10

7

votes

1 answer

Why is the state-action value function used more than the state value function?

In reinforcement learning, the state-action value function seems to be used more than the state value function. Why is it so?

reinforcement-learning comparison value-functions

asked Feb 14 '20 at 07:16

Bhuwan Bhatt

394
1
11

6

votes

1 answer

When to use the state value function $V(s)$ and when to use the state-action value function $Q(s, a)$?

I saw the difference between value function $V(s)$ and $Q(s, a)$. But when do I use each one? When I coded in Matlab I only used $Q(s, a)$ directly (as I was thinking of a tabular approach). So, when is more beneficial than the other? I have a large…

reinforcement-learning comparison value-functions

asked Nov 19 '21 at 14:35

knowledge_seeker

97
7

6

votes

2 answers

What is the Bellman Equation actually telling?

What does the Bellman equation actually say? And are there many flavours of that? I get a little confused when I look for the Bellman equation, because I feel like people are telling slightly different things about what it is. And I think the…

reinforcement-learning definitions value-functions bellman-equations

asked Dec 20 '20 at 21:49

Johnny

69
3

6

votes

3 answers

What is the target Q-value in DQNs?

I understand that in DQNs, the loss is measured by taking the MSE of outputted Q-values and target Q-values. What does the target Q-values represent? And how is it obtained/calculated by the DQN?

reinforcement-learning q-learning dqn deep-rl value-functions

asked Apr 19 '20 at 03:25

BG10

113
1
7

5

votes

2 answers

When to use Value Iteration vs. Policy Iteration

Both value iteration and policy iteration are General Policy Iteration (GPI) algorithms. However, they differ in the mechanics of their updates. Policy Iteration seeks to first find a completed value function for a policy, then derive the Q…

reinforcement-learning comparison q-learning value-functions policy-iteration

asked Oct 11 '21 at 19:36

SeeDerekEngineer

521
4
11

5

votes

1 answer

How would I compute the optimal state-action value for a certain state and action?

I am currently trying to learn reinforcement learning and I started with the basic gridworld application. I tried Q-learning with the following parameters: Learning rate = 0.1 Discount factor = 0.95 Exploration rate = 0.1 Default reward = 0 The…

reinforcement-learning q-learning value-functions bellman-equations

asked Feb 21 '21 at 18:56

Rim Sleimi

215
1
6

5

votes

4 answers

How to stop DQN Q function from increasing during learning?

Following the DQN algorithm with experience replay: Store transition $\left(\phi_{t}, a_{t}, r_{t}, \phi_{t+1}\right)$ in $D$ Sample random minibatch of transitions $\left(\phi_{j}, a_{j}, r_{j}, \phi_{j+1}\right)$ from $D$…

reinforcement-learning q-learning dqn objective-functions value-functions

asked Apr 24 '19 at 14:15

BestR

183
1
7

4

votes

1 answer

How can I ensure convergence of DDQN, if the true Q-values for different actions in the same state are very close?

I am applying a Double DQN algorithm to a highly stochastic environment where some of the actions in the agent's action space have very similar "true" Q-values (i.e. the expected future reward from either of these actions in the current state is…

reinforcement-learning value-functions convergence reward-functions double-dqn

asked Oct 24 '18 at 18:29

apitsch

93
9

4

votes

1 answer

In reinforcement learning, does the optimal value correspond to performing the best action in a given state?

I am confused about the definition of the optimal value ($V^*$) and optimal action-value (Q*) in reinforcement learning, so I need some clarification, because some blogs I read on Medium and GitHub are inconsistent with the literature. Originally, I…

reinforcement-learning definitions value-functions bellman-equations

asked Aug 24 '18 at 20:21

Rui Nian

423
3
13

4

votes

1 answer

An example of a unique value function which is associated with multiple optimal policies

In the 4th paragraph of http://www.incompleteideas.net/book/ebook/node37.html it is mentioned: Whereas the optimal value functions for states and state-action pairs are unique for a given MDP, there can be many optimal policies Could you please…

reinforcement-learning policies value-functions optimal-policy

asked Aug 20 '18 at 09:01

Melanie A

143
2

4

votes

2 answers

How do we get the optimal value-function?

In here it says that: (is it correct?) $$V^\pi = \sum_{a \in A}\pi(a|s)*Q^\pi(s,a)$$ And we have: $$ V^*(s) = max_\pi V^\pi(s)$$ Also: $$ V^*(s) = max_a Q^*(s, a) $$ Can someone demonstrate to me step by step how we got from $ V^*(s) = max_\pi…

reinforcement-learning value-functions bellman-equations

asked May 22 '23 at 16:26

Ness

216
1
8

4

votes

1 answer

Why are policy gradient methods more effective in high-dimensional action spaces?

David Silver argues, in his Reinforcement Learning course, that policy-based reinforcement learning (RL) is more effective than value-based RL in high-dimensional action spaces. He points out that the implicit policy (e.g., $\epsilon$-greedy) in…

policy-gradients value-functions function-approximation softmax value-based-methods

asked Dec 16 '22 at 12:52

Saucy Goat

143
4

Questions tagged [value-functions]