Why don't we bootstrap terminal state in n-step temporal difference prediction update equation?

Question

In the algorithm below, when $\tau + n \geq T$, shouldn't the algorithm bootstrap with the value of the next state? For instance, when $T=5, \tau=3, \& \; n=2$, we don't bootstrap the sample return with $V_{(\tau+n)}$, i.e., $V_5$ or the terminal state.

Text

Also, on line 4, what do we mean by "can take their index mod $n + 1$"?

score 1 · Answer 1 · answered Aug 26 '21 at 18:37

1

Because the value of the terminal state is 0 by definition. There is no further reward to be obtained once you reach the terminal state.

answered Aug 26 '21 at 18:37

tnfru

348
1
12

Why don't we bootstrap terminal state in n-step temporal difference prediction update equation?

1 Answers1