Highest Voted 'bootstrapping' Questions - Artificial Intelligence Stack Exchange

2

votes

0 answers

How does bootstrapping work with the offline $\lambda$-return algorithm?

In Barton and Sutton's book, Reinforcement Learning: An Introduction (2nd edition), an expression, on page 289 (equation 12.2), introduced the form of the $\lambda$-return defined as follows $$G_t^{\lambda} = (1-\lambda)\sum_{n=1}^{\infty}…

asked Jan 08 '21 at 11:49

quest ions

384
1
8

1

vote

1 answer

Doesn't the n-step Tree Backup algorithm negatively affect the DQN-Agent by creating inconsistent look-ahead targets?

In the text book of Sutton and Barto on page 152 they introduce the n-step Tree Backup algorithm, where the tree-backup n-step return is defined via $$ G_{t:t+n} = R_{t+1} + \gamma \sum_{a \neq A_{t+1}} \pi(a | S_{t+1})Q_{t+n-1}(S_{t+1}, a) + \gamma…

reinforcement-learning deep-rl dqn experience-replay bootstrapping

asked Apr 12 '22 at 11:11

Peter

55
4

1

vote

0 answers

Why do bootstrapping methods produce nonstationary targets more than non-bootstrapping methods?

The following quote is taken from the beginning of the chapter on "Approximate Solution Methods" (p. 198) in "Reinforcement Learning" by Sutton & Barto (2018): reinforcement learning generally requires function approximation methods able to handle…

reinforcement-learning monte-carlo-methods temporal-difference-methods stationary-policy bootstrapping

asked Jun 27 '20 at 13:00

Johan

121
4

0

votes

1 answer

How is $Q(s', a')$ calculated in SARSA and Q-Learning?

I have a question about how to update the Q-function in Q-learning and SARSA. Here (What are the differences between SARSA and Q-learning?) the following updating formulas are given: Q-Learning $$Q(s,a) = Q(s,a) + \alpha (R_{t+1} + \gamma…

reinforcement-learning q-learning sarsa bootstrapping

asked Dec 15 '21 at 14:36

PeterBe

212
1
11

Questions tagged [bootstrapping]

How does bootstrapping work with the offline $\lambda$-return algorithm?

Doesn't the n-step Tree Backup algorithm negatively affect the DQN-Agent by creating inconsistent look-ahead targets?

Why do bootstrapping methods produce nonstationary targets more than non-bootstrapping methods?

How is $Q(s', a')$ calculated in SARSA and Q-Learning?