Can I use deterministic policy gradient methods for stochastic policy learning?

Asked Jan 31 '19 at 01:33

Active Apr 04 '19 at 16:37

Viewed 133 times

Can I treat a stochastic policy (over a finite action space of size $n$) as a deterministic policy (in the set of probability distribution in $\mathbb{R}^n$)?

It seems to me that nothing is broken by making this mental translation, except that the "induced environment" now has to take a stochastic action and spit out the next state, which is not hard using on the original environment. Is this legit? If yes, how does this "deterministify then DDPG" approach compare to, for example, A2C?

edited Apr 04 '19 at 16:37

nbro

39,006
12
98
176

asked Jan 31 '19 at 01:33

ChubbyRuby

Can you clarify what you mean by "treat"? This question is unclear to me. – Philip Raeisghasem Apr 05 '19 at 06:47

Can I use deterministic policy gradient methods for stochastic policy learning?

0 Answers0