Highest Voted 'td-lambda' Questions - Artificial Intelligence Stack Exchange

9

votes

2 answers

What is the intuition behind TD($\lambda$)?

I'd like to better understand temporal-difference learning. In particular, I'm wondering if it is prudent to think about TD($\lambda$) as a type of "truncated" Monte Carlo learning?

asked Jan 21 '20 at 22:17

Nick Kunz

145
1
5

8

votes

2 answers

Why are lambda returns so rarely used in policy gradients?

I've seen the Monte Carlo return $G_{t}$ being used in REINFORCE and the TD($0$) target $r_t + \gamma Q(s', a')$ in vanilla actor-critic. However, I've never seen someone use the lambda return $G^{\lambda}_{t}$ in these situations, nor in any other…

reinforcement-learning policy-gradients reinforce return td-lambda

asked Jan 17 '19 at 19:27

jhinGhin

83
3

6

votes

1 answer

Can TD($\lambda$) be used with deep reinforcement learning?

TD lambda is a way to interpolate between TD(0) - bootstrapping over a single step, and, TD(max), bootstrapping over the entire episode length, or, Monte Carlo. Reading the link above, I see that an eligibility trace is kept for each state in order…

reinforcement-learning deep-rl temporal-difference-methods eligibility-traces td-lambda

asked Feb 02 '19 at 17:30

Gulzar

729
1
8
23

5

votes

1 answer

Why not more TD() in actor-critic algorithms?

Is there either an empirical or theoretical reason that actor-critic algorithms with eligibility traces have not been more fully explored? I was hoping to find a paper or implementation or both for continuous tasks (not episodic) in continuous…

reinforcement-learning actor-critic-methods temporal-difference-methods eligibility-traces td-lambda

asked Feb 17 '20 at 06:37

Nick Kunz

145
1
5

5

votes

2 answers

Why am I getting the incorrect value of lambda?

I am trying to solve for $\lambda$ using temporal-difference learning. More specifically, I am trying to figure out what $\lambda$ I need, such that $\text{TD}(\lambda)=\text{TD}(1)$, after one iteration. But I get the incorrect value of…

reinforcement-learning python markov-decision-process temporal-difference-methods td-lambda

asked May 20 '19 at 05:58

Amanda

205
1
5

2

votes

0 answers

How does bootstrapping work with the offline $\lambda$-return algorithm?

In Barton and Sutton's book, Reinforcement Learning: An Introduction (2nd edition), an expression, on page 289 (equation 12.2), introduced the form of the $\lambda$-return defined as follows $$G_t^{\lambda} = (1-\lambda)\sum_{n=1}^{\infty}…

reinforcement-learning td-lambda bootstrapping lambda-return lambda-return-algorithm

asked Jan 08 '21 at 11:49

quest ions

384
1
8

2

votes

0 answers

Why is TD(0) not converging to the optimal policy?

I am trying to implement the basic RL algorithms to learn on this 10x10 GridWorld (from REINFORCEJS by Kaparthy). Currently I am stuck at TD(0). No matter how many episodes I run, when I am updating the policy after all episodes are done according…

reinforcement-learning reinforce td-lambda

asked Nov 20 '20 at 11:21

PeeteKeesel

121
3

2

votes

1 answer

How is $\Delta$ updated in true online TD($\lambda$)?

In the RL textbook by Sutton & Barto section 7.4, the author talked about the "True online TD($\lambda$)". The figure (7.10 in the book) below shows the algorithm. At the end of each step, $V_{old} \leftarrow V(S')$ and also $S \leftarrow S'$. When…

reinforcement-learning temporal-difference-methods td-lambda

asked Jun 03 '20 at 15:50

roy

53
3

1

vote

0 answers

When do you back-propagate errors through a neural network when using TD($\lambda$)?

I have a neural network that I'm want to use to self-play Connect Four. The neural network receives the board state and is to provide an estimate of the state's value. I would then, for each move, use the highest estimate, occasionally I will use…

neural-networks reinforcement-learning backpropagation td-lambda

asked Nov 25 '17 at 23:13

NeomerArcana

210
3
12

1

vote

0 answers

What is 'eligibility' in intuitive terms in TD($\lambda$) learning?

I am watching the lecture from Brown University (in udemy) and I am in the portion of Temporal Difference Learning. In the pseudocode/algorithm of TD(1) (seen in the screenshot below), we initialise the eligibility $e(s) =0$ for all states. Later…

reinforcement-learning temporal-difference-methods eligibility-traces td-lambda

asked Mar 16 '22 at 05:41

cgo

175
5

1

vote

0 answers

How is the general return-based off-policy equation derived?

I'm wondering how is the general return-based off-policy equation in Safe and efficient off-policy reinforcement learning derived $$\mathcal{R} Q(x, a):=Q(x, a)+\mathbb{E}_{\mu}\left[\sum_{t \geq 0} \gamma^{t}\left(\prod_{s=1}^{t}…

reinforcement-learning papers temporal-difference-methods eligibility-traces td-lambda

asked Nov 16 '19 at 10:56

fish_tree

247
1
6

0

votes

1 answer

When using TD(λ), how do you calculate the eligibility trace per input & weight of a neural network neuron?

I have a Neural Network, each Neuron is made up of inputs, weights, and output. I have potentially multiple hidden layers. The activation function executed against the output is not known by the Neuron. I would like to use TD(λ) to back-propagate…

neural-networks reinforcement-learning temporal-difference-methods td-lambda

asked Mar 24 '22 at 07:52

NeomerArcana

210
3
12

Questions tagged [td-lambda]