For questions related to the "experience replay" buffer (as used in the Deep Q Network and similar works).
Questions tagged [experience-replay]
50 questions
18
votes
1 answer
How does LSTM in deep reinforcement learning differ from experience replay?
In the paper Deep Recurrent Q-Learning for Partially Observable MDPs, the author processed the Atari game frames with an LSTM layer at the end. My questions are:
How does this method differ from the experience replay, as they both use past…

Kevin. Fang
- 353
- 1
- 2
- 7
14
votes
2 answers
How large should the replay buffer be?
I'm learning DDPG algorithm by following the following link: Open AI Spinning Up document on DDPG, where it is written
In order for the algorithm to have stable behavior, the replay buffer should be large enough to contain a wide range of…

ycenycute
- 341
- 1
- 2
- 6
14
votes
3 answers
Why exactly do neural networks require i.i.d. data?
In reinforcement learning, successive states (actions and rewards) can be correlated. An experience replay buffer was used, in the DQN architecture, to avoid training the neural network (NN), which represents the $Q$ function, with correlated (or…

nbro
- 39,006
- 12
- 98
- 176
8
votes
1 answer
Is Experience Replay like dreaming?
Drawing parallels between Machine Learning techniques and a human brain is a dangerous operation. When it is done successfully, it can be a powerful tool for vulgarisation, but when it is done with no precaution, it can lead to major…

16Aghnar
- 591
- 2
- 10
8
votes
2 answers
What is experience replay in laymen's terms?
I've been reading Google's DeepMind Atari paper and I'm trying to understand the concept of "experience replay". Experience replay comes up in a lot of other reinforcement learning papers (particularly, the AlphaGo paper), so I want to understand…

user491626
- 241
- 1
- 4
7
votes
1 answer
In imitation learning, do you simply inject optimal tuples of experience $(s, a, r, s')$ into your experience replay buffer?
Due to my RL algorithm having difficulties learning some control actions, I've decided to use imitation learning/apprenticeship learning to guide my RL to perform the optimal actions. I've read a few articles on the subject and just want to confirm…

Rui Nian
- 423
- 3
- 13
7
votes
2 answers
Which kind of prioritized experience replay should I use?
The Prioritized Experience Replay paper gives two different ways of sampling from the replay buffer. One, called "proportional prioritization", assigns each transition a priority proportional to its TD-error.
$$p_i = |\delta_i|+\epsilon$$
The…

Philip Raeisghasem
- 2,028
- 9
- 29
5
votes
1 answer
How does being on-policy prevent us from using the replay buffer with the policy gradients?
One of the approaches to improving the stability of the Policy
Gradient family of methods is to use multiple environments in
parallel. The reason behind this is the fundamental problem we
discussed in Chapter 6, Deep Q-Network, when we talked about…

jgauth
- 161
- 10
4
votes
0 answers
Where does this variation of the importance sampling weight come from?
I have seeing a variation in importance sampling (IS) in Prioritized Experience Replay (PER) in some implementations regarding the original paper approach stated as (in section 3.4):
$$
w_{i}=\left(\frac{1}{N} \cdot…

HenDoNR
- 81
- 4
4
votes
1 answer
Why do DQNs tend to forget?
Why do DQNs tend to forget? Is it because when you feed highly correlated samples, your model (function approximation) doesn't give a general solution?
For example:
I use level 1 experiences, my model $p$ is fitted to learn how to play that…

Chukwudi
- 349
- 2
- 7
4
votes
1 answer
What does the notation $p_t = \text{max}_{i
I am having a hard time converting line 6 of the prioritized experience replay algorithm from the original paper into plain English (see below):
I understand that new transitions (not visited before) are given maximal priority. On line 6 this would…

Hanzy
- 499
- 3
- 10
4
votes
1 answer
Experience Replay Not Always Giving Better Results
I have recently started working on a control problem using a Deep Q Network as proposed by DeepMind (https://arxiv.org/abs/1312.5602). Initially, I implemented it without Experience Replay. The results were very satisfying, although after…

George Papagiannis
- 41
- 4
3
votes
1 answer
When using experience replay, do we update the parameters for all samples of the mini-batch or for each sample in the mini-batch separately?
I've been reading Google's DeepMind Atari paper and I'm trying to understand how to implement experience replay.
Do we update the parameters $\theta$ of function $Q$ once for all the samples of the minibatch, or do we do that for each sample of the…

user491626
- 241
- 1
- 4
3
votes
1 answer
Why is a large replay buffer inefficient?
Open AI spin up says
... the replay buffer should be large enough to contain a wide range
of experiences, but it may not always be good to keep everything. If
you only use the very-most recent data, you will overfit to that and
things will break;…

Sara
- 131
- 3
3
votes
0 answers
Why is it necessary to divide the priority range according to the batch size in Prioritized Experience Replay?
According to DeepMinds's paper Prioritized Experience Replay (2016), specifically Appendix B.2.1 "Proportional prioritization" (p. 13), one should equally divide the priority range $[0, p_\text{total}]$ into $k$ ranges, where $k$ is the size of the…

Firas_
- 31
- 2