Questions tagged [policy-based-methods]

For questions about policy-based (or policy search) reinforcement learning (RL) methods (or algorithms), which are RL algorithms that directly learn a policy (rather than first learning a value function). An example of a policy search algorithm is REINFORCE, which falls into the category of "policy gradient" algorithms (or "policy gradients"), which is a subset of policy-based algorithms that uses gradient information to guide the search.

8 questions

votes

1 answer

What are the advantages of RL with actor-critic methods over actor-only methods?

In general, what are the advantages of RL with actor-critic methods over actor-only (or policy-based) methods? This is not a comparison with the Q-learning series, but probably a method of learning the game with only the actor. I think it's…

asked Jan 12 '21 at 22:29

ground clown

votes

1 answer

Is reinforcement learning only about determining the value function?

I started reading some reinforcement learning literature, and it seems to me that all approaches to solving reinforcement learning problems are about finding the value function (state-value function or action-state value function). Are there any…

reinforcement-learning comparison actor-critic-methods policy-based-methods value-based-methods

asked Oct 24 '20 at 00:55

Felix P.

votes

1 answer

What makes TRPO an actor-critic method? Where is the critic?

From what I understand, Trust Region Policy Optimization (TRPO) is a modification on Natural Policy Gradient (NPG) that derives the optimal step size $\beta$ from a KL constraint between the new and old policy. NPG is a modification to "vanilla"…

reinforcement-learning policy-gradients actor-critic-methods trust-region-policy-optimization policy-based-methods

asked Oct 25 '22 at 20:28

thesofakillers

vote

0 answers

Is it possible to combine two policy-based RL agents?

I am developing an RL agent for a game environment. I have found out that there are two strategies to do well in the game. So I have trained two RL agents using neural networks with distinct reward functions. Each reward function corresponds to one…

reinforcement-learning deep-rl value-based-methods policy-based-methods

asked Jun 02 '22 at 07:35

BlackBrain

vote

0 answers

What are the disadvantages of actor-only methods with respect to value-based ones?

While the advantages of actor-only algorithms, the ones that search directly the policy without the use of the value function, are clear (possibility of having a continuous action space, a stochastic policy, etc.), I can't figure out the…

reinforcement-learning comparison policy-gradients policy-based-methods value-based-methods

asked Oct 01 '20 at 10:17

unter_983

vote

0 answers

Are policy-based methods better than value-based methods only for large action spaces?

In different books on reinforcement learning, policy-based methods are motivated by their ability to handle large (continuous) action spaces. Is this the only motivation for the policy-based methods? What if the action space is tiny (say, only 9…

reinforcement-learning policy-based-methods value-based-methods

asked Jun 23 '20 at 07:35

tmaric

votes

0 answers

As someone starting out in RL, could you help me understand the differences between actor-only, critic-only, and actor-critic methods?

I have been reading some medium articles and these three methods pop up a lot. I am wondering what the differences between these are, what are the advantages of one over the other, etc. Also from my understanding, actor-only method is synonymous to…

reinforcement-learning comparison actor-critic-methods value-based-methods policy-based-methods

asked Jul 01 '22 at 11:20

No-Time-To-Day

votes

1 answer

How to derive the dual function step by step in relative entropy policy search (REPS)?

TL:DR, (Why) is one of the terms in the expectation not derived properly? Relative entropy policy search or REPS is used to optimize a policy in an MDP. The update step is limited in the policy space (?) by the KL-divergence metric to stabilize the…

reinforcement-learning math policy-based-methods

asked Dec 30 '21 at 07:02

Sanyou