Highest Voted 'eligibility-traces' Questions - Artificial Intelligence Stack Exchange

6

votes

1 answer

Can TD($\lambda$) be used with deep reinforcement learning?

TD lambda is a way to interpolate between TD(0) - bootstrapping over a single step, and, TD(max), bootstrapping over the entire episode length, or, Monte Carlo. Reading the link above, I see that an eligibility trace is kept for each state in order…

asked Feb 02 '19 at 17:30

Gulzar

729
1
8
23

5

votes

1 answer

Why not more TD() in actor-critic algorithms?

Is there either an empirical or theoretical reason that actor-critic algorithms with eligibility traces have not been more fully explored? I was hoping to find a paper or implementation or both for continuous tasks (not episodic) in continuous…

reinforcement-learning actor-critic-methods temporal-difference-methods eligibility-traces td-lambda

asked Feb 17 '20 at 06:37

Nick Kunz

145
1
5

4

votes

1 answer

How to apply or extend the $Q(\lambda)$ algorithm to semi-MDPs?

I want to model an SMDP such that time is discretized and the transition time between the two states follows an exponential distribution and there would be no reward between the transition. Can I know what are the differences between $Q(\lambda)$…

reinforcement-learning q-learning semi-mdp eligibility-traces

asked Mar 10 '19 at 20:54

Amin

471
2
11

3

votes

0 answers

How to implement REINFORCE with eligibility traces?

The pseudocode below is taken from Barto and Sutton's "Reinforcement Learning: an introduction". It shows an actor-critic implementation with eligibility traces. My question is: if I set $\lambda^{\theta}=1$ and replace $\delta$ with the immediate…

reinforcement-learning actor-critic-methods reinforce eligibility-traces

asked Jan 20 '21 at 07:43

Javier Ventajas Hernández

131
2

3

votes

0 answers

Why weighting by lambda that sums to 1 ensures convergence in eligibility trace?

In Sutton and Barto's Book in chapter 12, they state that if weights sum to 1, then an equation's updates have "guaranteed convergence properties". Actually why it ensures convergence? There is a full citation from the mentioned fragment in Richard…

reinforcement-learning eligibility-traces

asked Sep 27 '20 at 10:58

Daniel Wiczew

323
2
10

2

votes

1 answer

Does eligibility traces and epsilon-greedy do the same task in different ways?

I understand that, in Reinforcement Learning algorithms, such as Q-learning, to prevent selecting the actions with greatest q-values too fast and allow for exploration, we use eligibility traces. Here are some questions Does $\epsilon$-greedy solve…

reinforcement-learning comparison exploration-exploitation-tradeoff epsilon-greedy-policy eligibility-traces

asked Oct 21 '17 at 07:39

Abhishek Dhyani

31
3

2

votes

0 answers

Watkins' Q(λ) with function approximation: why is gradient not considered when updating eligibility traces for the exploitation phase?

I'm implementing the Watkins' Q(λ) algorithm with function approximation (in 2nd edition of Sutton & Barto). I am very confused about updating the eligibility traces because, at the beginning of chapter 9.3 "Control with Function Approximation",…

reinforcement-learning q-learning gradient-descent function-approximation eligibility-traces

asked Dec 29 '21 at 17:17

Francesco Vignola

31
2

2

votes

1 answer

How to prove the formula of eligibility traces operator in reinforcement learning?

I don't understand how the formula in the red circle is derived. The screenshot is taken from this paper

reinforcement-learning proofs bellman-equations eligibility-traces

asked Jan 17 '21 at 05:05

hijkzzz

23
3

2

votes

1 answer

How do I derive the gradient with respect to the parameters of the softmax policy?

The gradient of the softmax eligibility trace is given by the following: \begin{align} \nabla_{\theta} \log(\pi_{\theta}(a|s)) &= \phi(s,a) - \mathbb E[\phi (s, \cdot)]\\ &= \phi(s,a) - \sum_{a'} \pi(a'|s) \phi(s,a') \end{align} How is this equation…

reinforcement-learning policy-gradients eligibility-traces

asked May 19 '20 at 05:02

Stephane Hatgis-Kessell

165
4

2

votes

1 answer

How can the $\lambda$-return be defined recursively?

The $\lambda$-return is defined as $$G_t^\lambda = (1-\lambda)\sum_{n=1}^\infty \lambda^{n-1}G_{t:t+n}$$ where $$G_{t:t+n} = R_{t+1}+\gamma R_{t+2}+\dots +\gamma^{n-1}R_{t+n} + \gamma^n\hat{v}(S_{t+n})$$ is the $n$-step return from time $t$. How can…

reinforcement-learning sutton-barto return eligibility-traces

asked Apr 12 '19 at 08:16

Philip Raeisghasem

2,028
9
29

1

vote

0 answers

What is 'eligibility' in intuitive terms in TD($\lambda$) learning?

I am watching the lecture from Brown University (in udemy) and I am in the portion of Temporal Difference Learning. In the pseudocode/algorithm of TD(1) (seen in the screenshot below), we initialise the eligibility $e(s) =0$ for all states. Later…

reinforcement-learning temporal-difference-methods eligibility-traces td-lambda

asked Mar 16 '22 at 05:41

cgo

175
5

1

vote

1 answer

Applying Eligibility Traces to Q-Learning algorithm does not improve results (And might not function well)

I am trying to apply Eligibility Traces to a currently working Q-Learning algorithm. The reference code for the Q-Learning algorithm was taken from this great blog by DeepLizard, but does not include Eligibility Traces. Link to the code on Google…

reinforcement-learning q-learning unsupervised-learning eligibility-traces

asked May 02 '20 at 20:18

Sahar Attia

21
3

1

vote

0 answers

How is the general return-based off-policy equation derived?

I'm wondering how is the general return-based off-policy equation in Safe and efficient off-policy reinforcement learning derived $$\mathcal{R} Q(x, a):=Q(x, a)+\mathbb{E}_{\mu}\left[\sum_{t \geq 0} \gamma^{t}\left(\prod_{s=1}^{t}…

reinforcement-learning papers temporal-difference-methods eligibility-traces td-lambda

asked Nov 16 '19 at 10:56

fish_tree

247
1
6

1

vote

0 answers

Eligibility trace In Model-based Reinforcement Learning

In model-based reinforcement learning algorithms, the model of the environment is constructed to efficiently use samples, models such as Dyna, and Prioritize Sweeping. Moreover, eligibility trace helps the model learns (action) value functions…

reinforcement-learning model-based-methods prioritized-sweeping eligibility-traces dyna

asked Jan 22 '19 at 22:16

Amin

471
2
11

0

votes

1 answer

How to deal with delay in reinforcement learning, an unclear case

According to the question in How to deal with the time delay in reinforcement learning?, we can tell the delay in the reinforcement learning can be observation delay, action delay and reward delay. I have a special case of the delay but I am not…

reinforcement-learning q-learning eligibility-traces delayed-rewards

asked Sep 10 '22 at 17:59

CharlesC

1
1

Questions tagged [eligibility-traces]