Questions tagged [experience-replay]

For questions related to the "experience replay" buffer (as used in the Deep Q Network and similar works).

50 questions
18
votes
1 answer

How does LSTM in deep reinforcement learning differ from experience replay?

In the paper Deep Recurrent Q-Learning for Partially Observable MDPs, the author processed the Atari game frames with an LSTM layer at the end. My questions are: How does this method differ from the experience replay, as they both use past…
14
votes
2 answers

How large should the replay buffer be?

I'm learning DDPG algorithm by following the following link: Open AI Spinning Up document on DDPG, where it is written In order for the algorithm to have stable behavior, the replay buffer should be large enough to contain a wide range of…
14
votes
3 answers

Why exactly do neural networks require i.i.d. data?

In reinforcement learning, successive states (actions and rewards) can be correlated. An experience replay buffer was used, in the DQN architecture, to avoid training the neural network (NN), which represents the $Q$ function, with correlated (or…
8
votes
1 answer

Is Experience Replay like dreaming?

Drawing parallels between Machine Learning techniques and a human brain is a dangerous operation. When it is done successfully, it can be a powerful tool for vulgarisation, but when it is done with no precaution, it can lead to major…
16Aghnar
  • 591
  • 2
  • 10
8
votes
2 answers

What is experience replay in laymen's terms?

I've been reading Google's DeepMind Atari paper and I'm trying to understand the concept of "experience replay". Experience replay comes up in a lot of other reinforcement learning papers (particularly, the AlphaGo paper), so I want to understand…
7
votes
1 answer

In imitation learning, do you simply inject optimal tuples of experience $(s, a, r, s')$ into your experience replay buffer?

Due to my RL algorithm having difficulties learning some control actions, I've decided to use imitation learning/apprenticeship learning to guide my RL to perform the optimal actions. I've read a few articles on the subject and just want to confirm…
7
votes
2 answers

Which kind of prioritized experience replay should I use?

The Prioritized Experience Replay paper gives two different ways of sampling from the replay buffer. One, called "proportional prioritization", assigns each transition a priority proportional to its TD-error. $$p_i = |\delta_i|+\epsilon$$ The…
5
votes
1 answer

How does being on-policy prevent us from using the replay buffer with the policy gradients?

One of the approaches to improving the stability of the Policy Gradient family of methods is to use multiple environments in parallel. The reason behind this is the fundamental problem we discussed in Chapter 6, Deep Q-Network, when we talked about…
4
votes
0 answers

Where does this variation of the importance sampling weight come from?

I have seeing a variation in importance sampling (IS) in Prioritized Experience Replay (PER) in some implementations regarding the original paper approach stated as (in section 3.4): $$ w_{i}=\left(\frac{1}{N} \cdot…
4
votes
1 answer

Why do DQNs tend to forget?

Why do DQNs tend to forget? Is it because when you feed highly correlated samples, your model (function approximation) doesn't give a general solution? For example: I use level 1 experiences, my model $p$ is fitted to learn how to play that…
4
votes
1 answer

Experience Replay Not Always Giving Better Results

I have recently started working on a control problem using a Deep Q Network as proposed by DeepMind (https://arxiv.org/abs/1312.5602). Initially, I implemented it without Experience Replay. The results were very satisfying, although after…
3
votes
1 answer

When using experience replay, do we update the parameters for all samples of the mini-batch or for each sample in the mini-batch separately?

I've been reading Google's DeepMind Atari paper and I'm trying to understand how to implement experience replay. Do we update the parameters $\theta$ of function $Q$ once for all the samples of the minibatch, or do we do that for each sample of the…
3
votes
1 answer

Why is a large replay buffer inefficient?

Open AI spin up says ... the replay buffer should be large enough to contain a wide range of experiences, but it may not always be good to keep everything. If you only use the very-most recent data, you will overfit to that and things will break;…
3
votes
0 answers

Why is it necessary to divide the priority range according to the batch size in Prioritized Experience Replay?

According to DeepMinds's paper Prioritized Experience Replay (2016), specifically Appendix B.2.1 "Proportional prioritization" (p. 13), one should equally divide the priority range $[0, p_\text{total}]$ into $k$ ranges, where $k$ is the size of the…
Firas_
  • 31
  • 2
1
2 3 4