Highest Voted 'experience-replay' Questions - Artificial Intelligence Stack Exchange

18

votes

1 answer

How does LSTM in deep reinforcement learning differ from experience replay?

In the paper Deep Recurrent Q-Learning for Partially Observable MDPs, the author processed the Atari game frames with an LSTM layer at the end. My questions are: How does this method differ from the experience replay, as they both use past…

asked Aug 27 '18 at 01:58

Kevin. Fang

353
1
2
7

14

votes

2 answers

How large should the replay buffer be?

I'm learning DDPG algorithm by following the following link: Open AI Spinning Up document on DDPG, where it is written In order for the algorithm to have stable behavior, the replay buffer should be large enough to contain a wide range of…

reinforcement-learning deep-rl hyper-parameters ddpg experience-replay

asked Apr 04 '19 at 14:40

ycenycute

341
1
2
6

14

votes

3 answers

Why exactly do neural networks require i.i.d. data?

In reinforcement learning, successive states (actions and rewards) can be correlated. An experience replay buffer was used, in the DQN architecture, to avoid training the neural network (NN), which represents the $Q$ function, with correlated (or…

neural-networks reinforcement-learning statistical-ai experience-replay iid

asked Feb 23 '19 at 13:30

nbro

39,006
12
98
176

8

votes

1 answer

Is Experience Replay like dreaming?

Drawing parallels between Machine Learning techniques and a human brain is a dangerous operation. When it is done successfully, it can be a powerful tool for vulgarisation, but when it is done with no precaution, it can lead to major…

reinforcement-learning dqn deep-rl experience-replay

asked Sep 09 '18 at 19:07

16Aghnar

591
2
10

8

votes

2 answers

What is experience replay in laymen's terms?

I've been reading Google's DeepMind Atari paper and I'm trying to understand the concept of "experience replay". Experience replay comes up in a lot of other reinforcement learning papers (particularly, the AlphaGo paper), so I want to understand…

deep-learning reinforcement-learning deep-rl experience-replay

asked May 30 '18 at 19:09

user491626

241
1
4

7

votes

1 answer

In imitation learning, do you simply inject optimal tuples of experience $(s, a, r, s')$ into your experience replay buffer?

Due to my RL algorithm having difficulties learning some control actions, I've decided to use imitation learning/apprenticeship learning to guide my RL to perform the optimal actions. I've read a few articles on the subject and just want to confirm…

reinforcement-learning deep-rl experience-replay imitation-learning apprenticeship-learning

asked Aug 27 '18 at 18:41

Rui Nian

423
3
13

7

votes

2 answers

Which kind of prioritized experience replay should I use?

The Prioritized Experience Replay paper gives two different ways of sampling from the replay buffer. One, called "proportional prioritization", assigns each transition a priority proportional to its TD-error. $$p_i = |\delta_i|+\epsilon$$ The…

deep-learning reinforcement-learning dqn experience-replay

asked May 05 '19 at 10:05

Philip Raeisghasem

2,028
9
29

5

votes

1 answer

How does being on-policy prevent us from using the replay buffer with the policy gradients?

One of the approaches to improving the stability of the Policy Gradient family of methods is to use multiple environments in parallel. The reason behind this is the fundamental problem we discussed in Chapter 6, Deep Q-Network, when we talked about…

reinforcement-learning policy-gradients actor-critic-methods experience-replay a3c

asked May 12 '20 at 15:17

jgauth

161
10

4

votes

0 answers

Where does this variation of the importance sampling weight come from?

I have seeing a variation in importance sampling (IS) in Prioritized Experience Replay (PER) in some implementations regarding the original paper approach stated as (in section 3.4): $$ w_{i}=\left(\frac{1}{N} \cdot…

reinforcement-learning dqn deep-rl experience-replay importance-sampling

asked Jan 21 '21 at 13:40

HenDoNR

81
4

4

votes

1 answer

Why do DQNs tend to forget?

Why do DQNs tend to forget? Is it because when you feed highly correlated samples, your model (function approximation) doesn't give a general solution? For example: I use level 1 experiences, my model $p$ is fitted to learn how to play that…

reinforcement-learning q-learning dqn experience-replay catastrophic-forgetting

asked Jul 27 '20 at 11:51

Chukwudi

349
2
7

4

votes

1 answer

What does the notation $p_t = \text{max}_{i

I am having a hard time converting line 6 of the prioritized experience replay algorithm from the original paper into plain English (see below): I understand that new transitions (not visited before) are given maximal priority. On line 6 this would…

reinforcement-learning dqn deep-rl experience-replay double-dqn

asked Jun 01 '19 at 02:26

Hanzy

499
3
10

4

votes

1 answer

Experience Replay Not Always Giving Better Results

I have recently started working on a control problem using a Deep Q Network as proposed by DeepMind (https://arxiv.org/abs/1312.5602). Initially, I implemented it without Experience Replay. The results were very satisfying, although after…

reinforcement-learning q-learning dqn experience-replay

asked Apr 29 '19 at 15:30

George Papagiannis

41
4

3

votes

1 answer

When using experience replay, do we update the parameters for all samples of the mini-batch or for each sample in the mini-batch separately?

I've been reading Google's DeepMind Atari paper and I'm trying to understand how to implement experience replay. Do we update the parameters $\theta$ of function $Q$ once for all the samples of the minibatch, or do we do that for each sample of the…

reinforcement-learning deep-rl dqn experience-replay mini-batch-gradient-descent

asked May 30 '18 at 19:56

user491626

241
1
4

3

votes

1 answer

Why is a large replay buffer inefficient?

Open AI spin up says ... the replay buffer should be large enough to contain a wide range of experiences, but it may not always be good to keep everything. If you only use the very-most recent data, you will overfit to that and things will break;…

reinforcement-learning deep-rl hyper-parameters experience-replay

asked Jun 14 '22 at 10:18

Sara

131
3

3

votes

0 answers

Why is it necessary to divide the priority range according to the batch size in Prioritized Experience Replay?

According to DeepMinds's paper Prioritized Experience Replay (2016), specifically Appendix B.2.1 "Proportional prioritization" (p. 13), one should equally divide the priority range $[0, p_\text{total}]$ into $k$ ranges, where $k$ is the size of the…

reinforcement-learning papers experience-replay

asked Sep 29 '20 at 14:11

Firas_

31
2

Questions tagged [experience-replay]