0

In the algorithm below, when $\tau + n \geq T$, shouldn't the algorithm bootstrap with the value of the next state? For instance, when $T=5, \tau=3, \& \; n=2$, we don't bootstrap the sample return with $V_{(\tau+n)}$, i.e., $V_5$ or the terminal state.

Text

Also, on line 4, what do we mean by "can take their index mod $n + 1$"?

user529295
  • 359
  • 1
  • 10

1 Answers1

1

Because the value of the terminal state is 0 by definition. There is no further reward to be obtained once you reach the terminal state.

tnfru
  • 348
  • 1
  • 12