Is there any difference between reward and return in reinforcement learning?

Question

I am reading Sutton and Barto's book on reinforcement learning. I thought that reward and return were the same things.

However, in Section 5.6 of the book, 3rd line, first paragraph, it is written:

Whereas in Chapter 2 we averaged rewards, in Monte Carlo methods we average returns.

What does it mean? Are rewards and returns different things?

score 6 · Accepted Answer · answered Jun 04 '20 at 04:22

6

Return refers to the total discounted reward, starting from the current timestep.

answered Jun 04 '20 at 04:22

stoic-santiago

1,121
5
18

score 3 · Answer 2 · answered Jun 05 '20 at 17:34

As the accepted answer states, the return at the current timestep is equal to the sum of discounted rewards from all future timesteps until the end of the episode. In Chapter 5 of Sutton and Barto, returns must be used to estimate the state-value and action-value functions because episode lengths are unrestricted and may be greater than one. In contrast, Chapter 2 deals with the very special case of multi-armed bandits in which episode lengths are always equal to one: The agent begins each episode in a fixed start state, takes an action, receives a reward, and then the episode terminates and the agent begins the next episode at the same start state. Therefore, a return is equivalent to a reward in Chapter 2 because all episodes have length one.

Is there any difference between reward and return in reinforcement learning?

2 Answers2

Linked