7

I am reading Sutton and Barto's book on reinforcement learning. I thought that reward and return were the same things.

However, in Section 5.6 of the book, 3rd line, first paragraph, it is written:

Whereas in Chapter 2 we averaged rewards, in Monte Carlo methods we average returns.

What does it mean? Are rewards and returns different things?

nbro
  • 39,006
  • 12
  • 98
  • 176
SJa
  • 371
  • 2
  • 15

2 Answers2

6

Return refers to the total discounted reward, starting from the current timestep.

stoic-santiago
  • 1,121
  • 5
  • 18
3

As the accepted answer states, the return at the current timestep is equal to the sum of discounted rewards from all future timesteps until the end of the episode. In Chapter 5 of Sutton and Barto, returns must be used to estimate the state-value and action-value functions because episode lengths are unrestricted and may be greater than one. In contrast, Chapter 2 deals with the very special case of multi-armed bandits in which episode lengths are always equal to one: The agent begins each episode in a fixed start state, takes an action, receives a reward, and then the episode terminates and the agent begins the next episode at the same start state. Therefore, a return is equivalent to a reward in Chapter 2 because all episodes have length one.

DeepQZero
  • 1,192
  • 1
  • 6
  • 22