1

Based on OpenAI Spinning Up description of Soft Actor Critic (SAC) the soft Q-function is defined as

enter image description here

and as they say

Q value is changed to include the entropy bonuses from every timestep except the first.

I feel like it should make sense somehow, but they do not give any further explanation, and I don't see why it is correct. Especially because in the soft value function the first bonus term is also used: enter image description here

Could someone please explain this?

Daniel
  • 111
  • 2

0 Answers0