2

Say that I have a simple Actor-Critic architecture, (I am not familiar with Tensorflow, but) in Pytorch we need to specify the parameters when defining an optimizer (SGD, Adam, etc) and therefore we can define 2 separate optimizers for the Actor and the Critic and the backward process will be

actor_loss.backward()
actor_optimizer.step()
critic_loss.backward()
critic_optimizer.step()

or we can use a single optimizer for both the Actor's and the Critic's parameters so the backward process can be like

loss = actor_loss + critic_loss
loss.backward()
optimizer.step()

I have 2 questions regarding both approaches:

  1. Is there any consideration (pros, cons) for both the single joined optimizer and the separate optimizer approach?

  2. If I want to save the best Agent (Actor and Critic) periodically (based on a predefined testing environment), do I always have to update the Critic, regardless of the current Agent's performance? Because (CMIIW) the Critic is (in its most basic purpose) only for predicting the action-value or state-value thus a more trained Critic is better.

nbro
  • 39,006
  • 12
  • 98
  • 176
Sanyou
  • 165
  • 2
  • 10
  • You're asking 2 distinct questions here. I would suggest that you ask the second question in a separate post. – nbro Oct 17 '21 at 02:38

1 Answers1

1

I am also very curious about this. I have been implementing A2C in PyTorch from scratch and have tried both a single optimizer and separate optimizers. The separate case learns so much quicker. I believe it may have something to do with the coefficients, balancing the critic coefficient so the two losses were within a similar range seemed to work.

Elfurd
  • 46
  • 5