Questions tagged [policy-based-methods]

For questions about policy-based (or policy search) reinforcement learning (RL) methods (or algorithms), which are RL algorithms that directly learn a policy (rather than first learning a value function). An example of a policy search algorithm is REINFORCE, which falls into the category of "policy gradient" algorithms (or "policy gradients"), which is a subset of policy-based algorithms that uses gradient information to guide the search.

8 questions
6
votes
1 answer

What are the advantages of RL with actor-critic methods over actor-only methods?

In general, what are the advantages of RL with actor-critic methods over actor-only (or policy-based) methods? This is not a comparison with the Q-learning series, but probably a method of learning the game with only the actor. I think it's…
5
votes
1 answer

Is reinforcement learning only about determining the value function?

I started reading some reinforcement learning literature, and it seems to me that all approaches to solving reinforcement learning problems are about finding the value function (state-value function or action-state value function). Are there any…
2
votes
1 answer

What makes TRPO an actor-critic method? Where is the critic?

From what I understand, Trust Region Policy Optimization (TRPO) is a modification on Natural Policy Gradient (NPG) that derives the optimal step size $\beta$ from a KL constraint between the new and old policy. NPG is a modification to "vanilla"…
1
vote
0 answers

Is it possible to combine two policy-based RL agents?

I am developing an RL agent for a game environment. I have found out that there are two strategies to do well in the game. So I have trained two RL agents using neural networks with distinct reward functions. Each reward function corresponds to one…
1
vote
0 answers

What are the disadvantages of actor-only methods with respect to value-based ones?

While the advantages of actor-only algorithms, the ones that search directly the policy without the use of the value function, are clear (possibility of having a continuous action space, a stochastic policy, etc.), I can't figure out the…
1
vote
0 answers

Are policy-based methods better than value-based methods only for large action spaces?

In different books on reinforcement learning, policy-based methods are motivated by their ability to handle large (continuous) action spaces. Is this the only motivation for the policy-based methods? What if the action space is tiny (say, only 9…
0
votes
0 answers

As someone starting out in RL, could you help me understand the differences between actor-only, critic-only, and actor-critic methods?

I have been reading some medium articles and these three methods pop up a lot. I am wondering what the differences between these are, what are the advantages of one over the other, etc. Also from my understanding, actor-only method is synonymous to…
0
votes
1 answer

How to derive the dual function step by step in relative entropy policy search (REPS)?

TL:DR, (Why) is one of the terms in the expectation not derived properly? Relative entropy policy search or REPS is used to optimize a policy in an MDP. The update step is limited in the policy space (?) by the KL-divergence metric to stabilize the…
Sanyou
  • 165
  • 2
  • 10