Can entropy bonus be used with state-independent log std for stochastic policies?

Asked Jul 21 '22 at 22:43

Active Jul 21 '22 at 22:43

Viewed 47 times

In this blog article by openai, they say the std of the exploration distribution must be state-dependent, i.e. an output of the policy network, so it works with the entropy bonus, which is an integral part of the SAC algorithm.

My question is: Does the std always have to be state-dependent when entropy bonus is used? PPO baselines uses a state-independent std for the exploration distribution.

asked Jul 21 '22 at 22:43

flxh

Can entropy bonus be used with state-independent log std for stochastic policies?

0 Answers0