Can I treat a stochastic policy (over a finite action space of size $n$) as a deterministic policy (in the set of probability distribution in $\mathbb{R}^n$)?
It seems to me that nothing is broken by making this mental translation, except that the "induced environment" now has to take a stochastic action and spit out the next state, which is not hard using on the original environment. Is this legit? If yes, how does this "deterministify then DDPG" approach compare to, for example, A2C?