For questions about policy-based (or policy search) reinforcement learning (RL) methods (or algorithms), which are RL algorithms that directly learn a policy (rather than first learning a value function). An example of a policy search algorithm is REINFORCE, which falls into the category of "policy gradient" algorithms (or "policy gradients"), which is a subset of policy-based algorithms that uses gradient information to guide the search.
Questions tagged [policy-based-methods]
8 questions
6
votes
1 answer
What are the advantages of RL with actor-critic methods over actor-only methods?
In general, what are the advantages of RL with actor-critic methods over actor-only (or policy-based) methods?
This is not a comparison with the Q-learning series, but probably a method of learning the game with only the actor.
I think it's…

ground clown
- 111
- 2
5
votes
1 answer
Is reinforcement learning only about determining the value function?
I started reading some reinforcement learning literature, and it seems to me that all approaches to solving reinforcement learning problems are about finding the value function (state-value function or action-state value function).
Are there any…

Felix P.
- 287
- 1
- 6
2
votes
1 answer
What makes TRPO an actor-critic method? Where is the critic?
From what I understand, Trust Region Policy Optimization (TRPO) is a modification on Natural Policy Gradient (NPG) that derives the optimal step size $\beta$ from a KL constraint between the new and old policy.
NPG is a modification to "vanilla"…

thesofakillers
- 309
- 2
- 14
1
vote
0 answers
Is it possible to combine two policy-based RL agents?
I am developing an RL agent for a game environment. I have found out that there are two strategies to do well in the game. So I have trained two RL agents using neural networks with distinct reward functions. Each reward function corresponds to one…

BlackBrain
- 111
- 2
1
vote
0 answers
What are the disadvantages of actor-only methods with respect to value-based ones?
While the advantages of actor-only algorithms, the ones that search directly the policy without the use of the value function, are clear (possibility of having a continuous action space, a stochastic policy, etc.), I can't figure out the…

unter_983
- 331
- 1
- 6
1
vote
0 answers
Are policy-based methods better than value-based methods only for large action spaces?
In different books on reinforcement learning, policy-based methods are motivated by their ability to handle large (continuous) action spaces. Is this the only motivation for the policy-based methods? What if the action space is tiny (say, only 9…

tmaric
- 382
- 2
- 8
0
votes
0 answers
As someone starting out in RL, could you help me understand the differences between actor-only, critic-only, and actor-critic methods?
I have been reading some medium articles and these three methods pop up a lot. I am wondering what the differences between these are, what are the advantages of one over the other, etc. Also from my understanding, actor-only method is synonymous to…

No-Time-To-Day
- 101
- 2
0
votes
1 answer
How to derive the dual function step by step in relative entropy policy search (REPS)?
TL:DR, (Why) is one of the terms in the expectation not derived properly?
Relative entropy policy search or REPS is used to optimize a policy in an MDP. The update step is limited in the policy space (?) by the KL-divergence metric to stabilize the…

Sanyou
- 165
- 2
- 10