In this blog article by openai, they say the std of the exploration distribution must be state-dependent, i.e. an output of the policy network, so it works with the entropy bonus, which is an integral part of the SAC algorithm.
My question is: Does the std always have to be state-dependent when entropy bonus is used? PPO baselines uses a state-independent std for the exploration distribution.