3

I am using stable-baseline3 implementation of the Soft-Actor-Critic (SAC) algorithm. The plotted training curves look promising. However, I am not fully sure how to interpret the actor and critic losses. The entropy coefficient $\alpha$ is automatically learned during training. As the entropy decreases, the critic loss and actor loss decrease as well.

  • How does the entropy coefficient affect the losses?
  • Can this be interpreted as the estimations becoming more accurate as the focus is shifted from exploration to exploitation?
  • How can negative actor losses be interpreted, what do actor losses tell in general?

enter image description here

Thanks a lot in advance

Manuel
  • 45
  • 5

0 Answers0