What is 'eligibility' in intuitive terms in TD($\lambda$) learning?

Asked 3 years ago

Active 3 years ago

Viewed 93 times

I am watching the lecture from Brown University (in udemy) and I am in the portion of Temporal Difference Learning.

In the pseudocode/algorithm of TD(1) (seen in the screenshot below), we initialise the eligibility $e(s) =0$ for all states. Later on, we decay this eligibility by a factor $\gamma$.

My question is, what does 'eligibility' mean in an intuitive sense? Previous dynamic programming algorithms (value iteration and policy iteration) do not have this 'eligibility' concept. Why is it here in TD? Is it because we are effectively sampling episodes here (unlike in the previous when we test all states and all possible actions?)

Insights welcome.

edited Mar 16 '22 at 12:05

nbro

39,006
12
98
176

asked Mar 16 '22 at 05:41

cgo

I recommend that you provide the link to the lecture's video and slides, not just the screenshot. – nbro Mar 16 '22 at 12:05

What is 'eligibility' in intuitive terms in TD($\lambda$) learning?

0 Answers0