Questions tagged [value-functions]

For questions related to the concept of value (or performance, or quality, or utility) function (as defined in reinforcement learning and other AI sub-fields). An example of this type of functions is the Q function (used e.g. in the Q-learning algorithm), also known as the state-action value function, given that $Q: S \times A \rightarrow \mathbb{R}$, where $S$ and $A$ are respectively the set of states and actions of the environment.

104 questions
10
votes
1 answer

What is the difference between expected return and value function?

I've seen numerous mathematical explanations of reward, value functions $V(s)$, and return functions. The reward provides an immediate return for being in a specific state. The better the reward, the better the state. As I understand it, it can be…
7
votes
2 answers

In Value Iteration, why can we initialize the value function arbitrarily?

I have not been able to find a good explanation of this, other than statements that the algorithm is guaranteed to converge with arbitrary choices for initial values in each state. Is this something to do with the Bellman optimality constraint…
7
votes
2 answers

Why does the state-action value function, defined as an expected value of the reward and state value function, not need to follow a policy?

I often see that the state-action value function is expressed as: $$q_{\pi}(s,a)=\color{red}{\mathbb{E}_{\pi}}[R_{t+1}+\gamma G_{t+1} | S_t=s, A_t = a] = \color{blue}{\mathbb{E}}[R_{t+1}+\gamma v_{\pi}(s') |S_t = s, A_t =a]$$ Why does expressing the…
7
votes
1 answer

Why is the state-action value function used more than the state value function?

In reinforcement learning, the state-action value function seems to be used more than the state value function. Why is it so?
6
votes
1 answer

When to use the state value function $V(s)$ and when to use the state-action value function $Q(s, a)$?

I saw the difference between value function $V(s)$ and $Q(s, a)$. But when do I use each one? When I coded in Matlab I only used $Q(s, a)$ directly (as I was thinking of a tabular approach). So, when is more beneficial than the other? I have a large…
6
votes
2 answers

What is the Bellman Equation actually telling?

What does the Bellman equation actually say? And are there many flavours of that? I get a little confused when I look for the Bellman equation, because I feel like people are telling slightly different things about what it is. And I think the…
6
votes
3 answers

What is the target Q-value in DQNs?

I understand that in DQNs, the loss is measured by taking the MSE of outputted Q-values and target Q-values. What does the target Q-values represent? And how is it obtained/calculated by the DQN?
5
votes
2 answers

When to use Value Iteration vs. Policy Iteration

Both value iteration and policy iteration are General Policy Iteration (GPI) algorithms. However, they differ in the mechanics of their updates. Policy Iteration seeks to first find a completed value function for a policy, then derive the Q…
5
votes
1 answer

How would I compute the optimal state-action value for a certain state and action?

I am currently trying to learn reinforcement learning and I started with the basic gridworld application. I tried Q-learning with the following parameters: Learning rate = 0.1 Discount factor = 0.95 Exploration rate = 0.1 Default reward = 0 The…
5
votes
4 answers

How to stop DQN Q function from increasing during learning?

Following the DQN algorithm with experience replay: Store transition $\left(\phi_{t}, a_{t}, r_{t}, \phi_{t+1}\right)$ in $D$ Sample random minibatch of transitions $\left(\phi_{j}, a_{j}, r_{j}, \phi_{j+1}\right)$ from $D$…
4
votes
1 answer

How can I ensure convergence of DDQN, if the true Q-values for different actions in the same state are very close?

I am applying a Double DQN algorithm to a highly stochastic environment where some of the actions in the agent's action space have very similar "true" Q-values (i.e. the expected future reward from either of these actions in the current state is…
4
votes
1 answer

In reinforcement learning, does the optimal value correspond to performing the best action in a given state?

I am confused about the definition of the optimal value ($V^*$) and optimal action-value (Q*) in reinforcement learning, so I need some clarification, because some blogs I read on Medium and GitHub are inconsistent with the literature. Originally, I…
4
votes
1 answer

An example of a unique value function which is associated with multiple optimal policies

In the 4th paragraph of http://www.incompleteideas.net/book/ebook/node37.html it is mentioned: Whereas the optimal value functions for states and state-action pairs are unique for a given MDP, there can be many optimal policies Could you please…
4
votes
2 answers

How do we get the optimal value-function?

In here it says that: (is it correct?) $$V^\pi = \sum_{a \in A}\pi(a|s)*Q^\pi(s,a)$$ And we have: $$ V^*(s) = max_\pi V^\pi(s)$$ Also: $$ V^*(s) = max_a Q^*(s, a) $$ Can someone demonstrate to me step by step how we got from $ V^*(s) = max_\pi…
4
votes
1 answer

Why are policy gradient methods more effective in high-dimensional action spaces?

David Silver argues, in his Reinforcement Learning course, that policy-based reinforcement learning (RL) is more effective than value-based RL in high-dimensional action spaces. He points out that the implicit policy (e.g., $\epsilon$-greedy) in…
1
2 3 4 5 6 7