Highest Voted 'a3c' Questions - Artificial Intelligence Stack Exchange

5

votes

1 answer

How does being on-policy prevent us from using the replay buffer with the policy gradients?

One of the approaches to improving the stability of the Policy Gradient family of methods is to use multiple environments in parallel. The reason behind this is the fundamental problem we discussed in Chapter 6, Deep Q-Network, when we talked about…

asked May 12 '20 at 15:17

jgauth

161
10

4

votes

0 answers

Can deep successor representations be used with the A3C algorithm?

Deep Successor Representations(DSR) has given better performance in tasks like navigation, when compared to normal model-free RL tasks. Basically, DSR is a hybrid of model-free RL and model-based RL. But the original work has only used value-based…

reinforcement-learning papers a3c

asked Aug 19 '18 at 06:15

Shamane Siriwardhana

191
6

3

votes

1 answer

What is the pros and cons of increasing and decreasing the number of worker processes in A3C?

In A3C, there are several child processes and one master process. The child precesses calculate the loss and backpropagation, and the master process sums them up and updates the parameters, if I understand it correctly. But I wonder how I should…

reinforcement-learning deep-rl hyperparameter-optimization hyper-parameters a3c

asked Aug 26 '18 at 08:49

Blaszard

1,027
2
11
25

2

votes

1 answer

Why do we also need to normalize the action's values on continuous action spaces?

I was reading here tips & tricks for training in DRL and I noticed the following: always normalize your observation space when you can, i.e., when you know the boundaries normalize your action space and make it symmetric when continuous (cf…

reinforcement-learning training deep-rl a3c

asked May 26 '20 at 13:34

mkanakis

175
6

1

vote

0 answers

Understanding loss function gradient in asynchronous advantage actor-critic (A3C) algorithm

This is a question I posted here. I am asking it on this StackExchange branch as well, so that more people who could potentially answer get to see the question. In the A3C algorithm from the original paper: the gradient with respect to log policy…

reinforcement-learning policy-gradients softmax a3c

asked Jan 31 '21 at 18:25

Kagaratsch

111
2

1

vote

1 answer

How do I create a custom gym environment based on an image?

I am trying to create my own gym environment for the A3C algorithm (one implementation is here). The custom environment is a simple login form for any site. I want to create an environment from an image. The idea is to take a screenshot of the web…

reinforcement-learning open-ai gym a3c

asked Nov 03 '20 at 15:54

Ren

21
3

1

vote

0 answers

is it ok to take random actions while training a3c as in below code

i am trying to train an A3C algorithm but I am getting same output in the multinomial function. can I train the A3C with random actions as in below code. can someone expert comment. while count

reinforcement-learning pytorch a3c

asked Aug 29 '20 at 14:50

user2783767

121
2

1

vote

0 answers

When past states contain useful information, does A3C perform better than TD3, given that TD3 does not use an LSTM?

I am trying to build an AI that needs to have some information about the past states as well. Therefore, LSTMs are suitable for this. Now, I want to know that for a problem/game like Breakout, where we require previous states as well, does A3C…

reinforcement-learning comparison long-short-term-memory actor-critic-methods a3c

asked Aug 04 '20 at 05:59

user2783767

121
2

1

vote

0 answers

How should I deal with variable batch size in A3C?

I am fairly new to reinforcement learning (RL) and deep RL. I have been trying to create my first agent (using A3C) that selects an optimal path with the reward being some associated completion time (the more optimal the path is, packets will be…

neural-networks reinforcement-learning actor-critic-methods a3c

asked May 07 '20 at 18:30

mkanakis

175
6

0

votes

0 answers

The episode length increases at the start till it reaches a peak then decreases. What can cause this unexpected behavior?

I am running the A3C algorithm to evaluate a policy based on a policy gradient method. I observe an unexpected behavior at the start of the episode in the reward and episode length. As shown in the figure blew, at the start of the training the…

reference-request a3c

asked Nov 22 '22 at 11:39

Salwa Mostafa

1

0

votes

0 answers

Tensorflow-gpu and multiprocessing

I have finished implementing an Asynchronous Advantage Actor-Critic (A3C) agent for TensorFlow (gpu). By using a single RMSprop optimizer with shared statistics. To do so, a central controller holds both the Global Network (ActorCriticModel) and the…

reinforcement-learning tensorflow actor-critic-methods gpu a3c

asked Jul 12 '22 at 15:25

Lyn Cassidy

1
1

0

votes

0 answers

How can I compare the results of AC1 with the results of A3C (on the CartPole environment)?

I am implementing A3C for the CartPole environment. I want to compare the results I got from A3C with the ones I got from AC1. The problem is I don't know which process to look at. If I use, let's say, 11 processes, should I take the first one which…

actor-critic-methods advantage-actor-critic a3c

asked May 24 '21 at 11:11

Leon Jovanovic

1

0

votes

1 answer

Why would the reward of A3C with LSTM suddenly drop off after many episodes?

I am training an A3C with stacked LSTM. During initial training, my model was giving descent +ve reward. However, after many episodes, its reward just goes to zero and is continuing for a long time. Is it because of LSTM? Is it normal? Should I…

reinforcement-learning deep-rl pytorch actor-critic-methods a3c

asked Sep 12 '20 at 20:52

user2783767

121
2

0

votes

2 answers

Why I got the same action when testing the A2C?

I'm working on an advantage actor-critic (A2C) reinforcement learning model, but when I test the model after I trained for 3500 episodes, I start to get almost the same action for all testing episodes. While if I trained the system for less than 850…

reinforcement-learning python actor-critic-methods advantage-actor-critic a3c

asked Jan 16 '20 at 12:54

I_Al-thamary

52
1
13

Questions tagged [a3c]