Highest Voted 'ddpg' Questions - Artificial Intelligence Stack Exchange

14

votes

2 answers

How large should the replay buffer be?

I'm learning DDPG algorithm by following the following link: Open AI Spinning Up document on DDPG, where it is written In order for the algorithm to have stable behavior, the replay buffer should be large enough to contain a wide range of…

asked Apr 04 '19 at 14:40

ycenycute

341
1
2
6

7

votes

0 answers

Is there a difference in the architecture of deep reinforcement learning when multiple actions are performed instead of a single action?

I've built a deep deterministic policy gradient reinforcement learning agent to be able to handle any games/tasks that have only one action. However, the agent seems to fail horribly when there are two or more actions. I tried to look online for…

reinforcement-learning deep-rl ddpg action-spaces

asked Aug 24 '18 at 21:11

Rui Nian

423
3
13

7

votes

2 answers

Why are reinforcement learning methods sample inefficient?

Reinforcement learning methods are considered to be extremely sample inefficient. For example, in a recent DeepMind paper by Hessel et al., they showed that in order to reach human-level performance on an Atari game running at 60 frames per second…

reinforcement-learning dqn papers ddpg sample-efficiency

asked Mar 14 '20 at 20:23

rrz0

263
2
7

5

votes

1 answer

How does the Ornstein-Uhlenbeck process work, and how it is used in DDPG?

In section 3 of the paper Continuous control with deep reinforcement learning, the authors write As detailed in the supplementary materials we used an Ornstein-Uhlenbeck process (Uhlenbeck & Ornstein, 1930) to generate temporally correlated…

reinforcement-learning deep-rl policy-gradients papers ddpg

asked Aug 21 '20 at 20:00

dani

51
3

5

votes

1 answer

Why Q2 is a more or less independant estimate in Twin Delayed DDPG (TD3)?

Twin Delayed Deep Deterministic (TD3) policy gradient is inspired by both double Q-learning and double DQN. In double Q-learning, I understand that Q1 and Q2 are independent because they are trained on different samples. In double DQN, I understand…

reinforcement-learning q-learning dqn deep-rl ddpg

asked Mar 24 '19 at 05:26

Luke Guye

61
2

4

votes

1 answer

What made your DDPG implementation on your environment work?

I am working on scheduling problem that has inherent randomness. The dimensions of action and state spaces are 1 and 5 respectively. I am using DDPG, but it seems extremely unstable, and so far it isn't showing much learning. I've tried to adjust…

reinforcement-learning implementation ddpg hyper-parameters

asked May 23 '20 at 04:39

Schach21

242
1
9

4

votes

1 answer

How to avoid rapid actuator movements in favor of smooth movements in a continuous space and action space problem?

I'm working on a continuous state / continuous action controller. It shall control a certain roll angle of an aircraft by issuing the correct aileron commands (in $[-1, 1]$). To this end, I use a neural network and the DDPG algorithm, which shows…

reinforcement-learning rewards ddpg reward-shaping reward-design

asked Jan 31 '20 at 10:45

opt12

171
4

4

votes

0 answers

What is the simplest policy gradient method to implement for a problem continuous action space?

I have a problem I would like to tackle with RL, but I am not sure if it is even doable. My agent has to figure out how to fill a very large vector (let's say from 600 to 4000 in the most complex setting) made of natural numbers, i.e. a 600 vector…

reinforcement-learning policy-gradients ddpg reinforce continuous-action-spaces

asked Aug 07 '19 at 16:23

FS93

145
6

3

votes

0 answers

How to deal with a moving target in the Lunar Lander environment with DDPG?

I have noticed that DDPG does rather well at solving environments with a static target. For example, the default of Lunar Lander, the flags do not change position. So the DDPG model learns how to get to the center of the screen and land fairly…

reinforcement-learning deep-rl actor-critic-methods ddpg gym

asked May 03 '21 at 03:59

user1779362

131
2

3

votes

1 answer

Appropriate algorithm for RL problem with sparse rewards, continuous actions and significant stochasticity

I'm working on a RL problem with the following properties: The rewards are extremely sparse i.e. all rewards are 0 except the terminal non-zero reward. Ideally I would not use any reward engineering as that would lead to a different optimization…

reinforcement-learning rewards policy-gradients ddpg

asked Apr 23 '20 at 09:39

BGa

229
1
5

3

votes

0 answers

How does adding noise to the action in DDPG help in learning?

I can't understand how playing with the action generated by the actor network in DDPG by adding the noise term helps in exploration.

deep-learning reinforcement-learning deep-rl ddpg

asked Feb 23 '20 at 13:25

Ahmad Fares

31
1

3

votes

1 answer

Purpose of using actor-critic algorithms under deterministic MDP dynamics?

One of the main disadvantages of the MC Policy Gradient algorithm (REINFORCE) as described say here is the fact that it has high variance (returns, which we sample, will significantly vary from episode to episode). Therefore it is perfectly…

reinforcement-learning deep-rl policy-gradients reinforce ddpg

asked Nov 12 '19 at 14:25

BGa

229
1
5

3

votes

1 answer

Training actor-critic algorithms in games with opponents

I am wondering how am I supposed to train a model using actor/critic algorithms in environments with opponents. I tried the followings (using A3C and DDPG): Play against random player. I had rather good results, but not as good as expected since…

neural-networks deep-learning reinforcement-learning ddpg

asked May 11 '19 at 19:48

Loheek

266
2
6

3

votes

0 answers

Should noise (such as OU) be decreased over time in actor / critic algorithms?

In most of RL algorithms I saw, there is a coefficient that reduces actions exploration over time, to help convergence. But in Actor-Critic, or other algorithms (A3C, DDPG, ...) used in continuous action spaces, the different implementation I saw…

deep-learning reinforcement-learning actor-critic-methods ddpg

asked Apr 30 '19 at 16:25

Loheek

266
2
6

3

votes

0 answers

Can I use deterministic policy gradient methods for stochastic policy learning?

Can I treat a stochastic policy (over a finite action space of size $n$) as a deterministic policy (in the set of probability distribution in $\mathbb{R}^n$)? It seems to me that nothing is broken by making this mental translation, except that the…

reinforcement-learning policy-gradients ddpg

asked Jan 31 '19 at 01:33

ChubbyRuby

31
2

Questions tagged [ddpg]