For questions related to the concept of value (or performance, or quality, or utility) function (as defined in reinforcement learning and other AI sub-fields). An example of this type of functions is the Q function (used e.g. in the Q-learning algorithm), also known as the state-action value function, given that $Q: S \times A \rightarrow \mathbb{R}$, where $S$ and $A$ are respectively the set of states and actions of the environment.
Questions tagged [value-functions]
104 questions
10
votes
1 answer
What is the difference between expected return and value function?
I've seen numerous mathematical explanations of reward, value functions $V(s)$, and return functions. The reward provides an immediate return for being in a specific state. The better the reward, the better the state.
As I understand it, it can be…

user3168961
- 221
- 2
- 6
7
votes
2 answers
In Value Iteration, why can we initialize the value function arbitrarily?
I have not been able to find a good explanation of this, other than statements that the algorithm is guaranteed to converge with arbitrary choices for initial values in each state. Is this something to do with the Bellman optimality constraint…

Arham
- 73
- 3
7
votes
2 answers
Why does the state-action value function, defined as an expected value of the reward and state value function, not need to follow a policy?
I often see that the state-action value function is expressed as:
$$q_{\pi}(s,a)=\color{red}{\mathbb{E}_{\pi}}[R_{t+1}+\gamma G_{t+1} | S_t=s, A_t = a] = \color{blue}{\mathbb{E}}[R_{t+1}+\gamma v_{\pi}(s') |S_t = s, A_t =a]$$
Why does expressing the…

Daniel Wiczew
- 323
- 2
- 10
7
votes
1 answer
Why is the state-action value function used more than the state value function?
In reinforcement learning, the state-action value function seems to be used more than the state value function. Why is it so?

Bhuwan Bhatt
- 394
- 1
- 11
6
votes
1 answer
When to use the state value function $V(s)$ and when to use the state-action value function $Q(s, a)$?
I saw the difference between value function $V(s)$ and $Q(s, a)$. But when do I use each one? When I coded in Matlab I only used $Q(s, a)$ directly (as I was thinking of a tabular approach). So, when is more beneficial than the other? I have a large…

knowledge_seeker
- 97
- 7
6
votes
2 answers
What is the Bellman Equation actually telling?
What does the Bellman equation actually say? And are there many flavours of that?
I get a little confused when I look for the Bellman equation, because I feel like people are telling slightly different things about what it is. And I think the…

Johnny
- 69
- 3
6
votes
3 answers
What is the target Q-value in DQNs?
I understand that in DQNs, the loss is measured by taking the MSE of outputted Q-values and target Q-values.
What does the target Q-values represent? And how is it obtained/calculated by the DQN?

BG10
- 113
- 1
- 7
5
votes
2 answers
When to use Value Iteration vs. Policy Iteration
Both value iteration and policy iteration are General Policy Iteration (GPI) algorithms. However, they differ in the mechanics of their updates. Policy Iteration seeks to first find a completed value function for a policy, then derive the Q…

SeeDerekEngineer
- 521
- 4
- 11
5
votes
1 answer
How would I compute the optimal state-action value for a certain state and action?
I am currently trying to learn reinforcement learning and I started with the basic gridworld application. I tried Q-learning with the following parameters:
Learning rate = 0.1
Discount factor = 0.95
Exploration rate = 0.1
Default reward = 0
The…

Rim Sleimi
- 215
- 1
- 6
5
votes
4 answers
How to stop DQN Q function from increasing during learning?
Following the DQN algorithm with experience replay:
Store transition $\left(\phi_{t}, a_{t}, r_{t}, \phi_{t+1}\right)$ in $D$ Sample random minibatch of transitions $\left(\phi_{j}, a_{j}, r_{j}, \phi_{j+1}\right)$ from $D$…

BestR
- 183
- 1
- 7
4
votes
1 answer
How can I ensure convergence of DDQN, if the true Q-values for different actions in the same state are very close?
I am applying a Double DQN algorithm to a highly stochastic environment where some of the actions in the agent's action space have very similar "true" Q-values (i.e. the expected future reward from either of these actions in the current state is…

apitsch
- 93
- 9
4
votes
1 answer
In reinforcement learning, does the optimal value correspond to performing the best action in a given state?
I am confused about the definition of the optimal value ($V^*$) and optimal action-value (Q*) in reinforcement learning, so I need some clarification, because some blogs I read on Medium and GitHub are inconsistent with the literature.
Originally, I…

Rui Nian
- 423
- 3
- 13
4
votes
1 answer
An example of a unique value function which is associated with multiple optimal policies
In the 4th paragraph of
http://www.incompleteideas.net/book/ebook/node37.html
it is mentioned:
Whereas the optimal value functions for states and state-action pairs are unique for a given MDP, there can be many optimal policies
Could you please…

Melanie A
- 143
- 2
4
votes
2 answers
How do we get the optimal value-function?
In here it says that: (is it correct?)
$$V^\pi = \sum_{a \in A}\pi(a|s)*Q^\pi(s,a)$$
And we have:
$$ V^*(s) = max_\pi V^\pi(s)$$
Also:
$$ V^*(s) = max_a Q^*(s, a) $$
Can someone demonstrate to me step by step how we got from $ V^*(s) = max_\pi…

Ness
- 216
- 1
- 8
4
votes
1 answer
Why are policy gradient methods more effective in high-dimensional action spaces?
David Silver argues, in his Reinforcement Learning course, that policy-based reinforcement learning (RL) is more effective than value-based RL in high-dimensional action spaces. He points out that the implicit policy (e.g., $\epsilon$-greedy) in…

Saucy Goat
- 143
- 4