How to interpret the training loss curves in Soft-Actor-Critic (SAC)?

Asked Jul 01 '21 at 07:44

Active Jul 01 '21 at 07:44

Viewed 1,123 times

I am using stable-baseline3 implementation of the Soft-Actor-Critic (SAC) algorithm. The plotted training curves look promising. However, I am not fully sure how to interpret the actor and critic losses. The entropy coefficient $\alpha$ is automatically learned during training. As the entropy decreases, the critic loss and actor loss decrease as well.

How does the entropy coefficient affect the losses?
Can this be interpreted as the estimations becoming more accurate as the focus is shifted from exploration to exploitation?
How can negative actor losses be interpreted, what do actor losses tell in general?

Thanks a lot in advance

asked Jul 01 '21 at 07:44

Manuel

I am getting negative actor loss in SAC implementation from stable-baseline3. Anything on this? – devil in the detail Dec 06 '21 at 16:29
@devilinthedetail that's expected and good, if expected return is positive :) – kmf Jun 06 '22 at 15:12

How to interpret the training loss curves in Soft-Actor-Critic (SAC)?

0 Answers0