For questions related to the asynchronous advantage actor-critic (A3C) algorithm.
Questions tagged [a3c]
14 questions
5
votes
1 answer
How does being on-policy prevent us from using the replay buffer with the policy gradients?
One of the approaches to improving the stability of the Policy
Gradient family of methods is to use multiple environments in
parallel. The reason behind this is the fundamental problem we
discussed in Chapter 6, Deep Q-Network, when we talked about…

jgauth
- 161
- 10
4
votes
0 answers
Can deep successor representations be used with the A3C algorithm?
Deep Successor Representations(DSR) has given better performance in tasks like navigation, when compared to normal model-free RL tasks. Basically, DSR is a hybrid of model-free RL and model-based RL. But the original work has only used value-based…

Shamane Siriwardhana
- 191
- 6
3
votes
1 answer
What is the pros and cons of increasing and decreasing the number of worker processes in A3C?
In A3C, there are several child processes and one master process. The child precesses calculate the loss and backpropagation, and the master process sums them up and updates the parameters, if I understand it correctly.
But I wonder how I should…

Blaszard
- 1,027
- 2
- 11
- 25
2
votes
1 answer
Why do we also need to normalize the action's values on continuous action spaces?
I was reading here tips & tricks for training in DRL and I noticed the following:
always normalize your observation space when you can, i.e., when you know the boundaries
normalize your action space and make it symmetric when continuous (cf…

mkanakis
- 175
- 6
1
vote
0 answers
Understanding loss function gradient in asynchronous advantage actor-critic (A3C) algorithm
This is a question I posted here. I am asking it on this StackExchange branch as well, so that more people who could potentially answer get to see the question.
In the A3C algorithm from the original paper:
the gradient with respect to log policy…

Kagaratsch
- 111
- 2
1
vote
1 answer
How do I create a custom gym environment based on an image?
I am trying to create my own gym environment for the A3C algorithm (one implementation is here). The custom environment is a simple login form for any site. I want to create an environment from an image. The idea is to take a screenshot of the web…

Ren
- 21
- 3
1
vote
0 answers
is it ok to take random actions while training a3c as in below code
i am trying to train an A3C algorithm but I am getting same output in the multinomial function.
can I train the A3C with random actions as in below code.
can someone expert comment.
while count

user2783767
- 121
- 2
1
vote
0 answers
When past states contain useful information, does A3C perform better than TD3, given that TD3 does not use an LSTM?
I am trying to build an AI that needs to have some information about the past states as well. Therefore, LSTMs are suitable for this.
Now, I want to know that for a problem/game like Breakout, where we require previous states as well, does A3C…

user2783767
- 121
- 2
1
vote
0 answers
How should I deal with variable batch size in A3C?
I am fairly new to reinforcement learning (RL) and deep RL. I have been trying to create my first agent (using A3C) that selects an optimal path with the reward being some associated completion time (the more optimal the path is, packets will be…

mkanakis
- 175
- 6
0
votes
0 answers
The episode length increases at the start till it reaches a peak then decreases. What can cause this unexpected behavior?
I am running the A3C algorithm to evaluate a policy based on a policy gradient method. I observe an unexpected behavior at the start of the episode in the reward and episode length. As shown in the figure blew, at the start of the training the…
0
votes
0 answers
Tensorflow-gpu and multiprocessing
I have finished implementing an Asynchronous Advantage Actor-Critic (A3C) agent for TensorFlow (gpu). By using a single RMSprop optimizer with shared statistics. To do so, a central controller holds both the Global Network (ActorCriticModel) and the…

Lyn Cassidy
- 1
- 1
0
votes
0 answers
How can I compare the results of AC1 with the results of A3C (on the CartPole environment)?
I am implementing A3C for the CartPole environment. I want to compare the results I got from A3C with the ones I got from AC1. The problem is I don't know which process to look at. If I use, let's say, 11 processes, should I take the first one which…
0
votes
1 answer
Why would the reward of A3C with LSTM suddenly drop off after many episodes?
I am training an A3C with stacked LSTM.
During initial training, my model was giving descent +ve reward. However, after many episodes, its reward just goes to zero and is continuing for a long time. Is it because of LSTM?
Is it normal?
Should I…

user2783767
- 121
- 2
0
votes
2 answers
Why I got the same action when testing the A2C?
I'm working on an advantage actor-critic (A2C) reinforcement learning model, but when I test the model after I trained for 3500 episodes, I start to get almost the same action for all testing episodes. While if I trained the system for less than 850…

I_Al-thamary
- 52
- 1
- 13