Questions tagged [reward-to-go]

For questions about the concept of "reward-to-go", which comes up e.g. in the context of policy gradients. The "expected reward-to-go for all states" is sometimes used as a synonym for "value function". See e.g. the paper "Learning the Variance of the Reward-To-Go" (2016) by Aviv Tamar et al. for more details.

4 questions
8
votes
2 answers

Why does the "reward to go" trick in policy gradient methods work?

In the policy gradient method, there's a trick to reduce the variance of policy gradient. We use causality, and remove part of the sum over rewards so that only actions happened after the reward are taken into account (See here…
6
votes
1 answer

What is the return-to-go in reinforcement learning?

In reinforcement learning, the return is defined as some function of the rewards. For example, you can have the discounted return, where you multiply the rewards received at later time steps by increasingly smaller numbers, so that the rewards…
nbro
  • 39,006
  • 12
  • 98
  • 176
2
votes
0 answers

What is the proof that "reward-to-go" reduces variance of policy gradient?

I am following the OpenAI's spinning up tutorial Part 3: Intro to Policy Optimization. It is mentioned there that the reward-to-go reduces the variance of the policy gradient. While I understand the intuition behind it, I struggle to find a proof in…
1
vote
1 answer

Why is the "reward to go" replaced by Q instead of V, when transitioning from PG to actor critic methods?

While transitioning from simple policy gradient to the actor-critic algorithm, most sources begin by replacing the "reward to go" with the state-action value function (see this slide 5). I am not able to understand how this is mathematically…