0

Why are policy gradient methods popular in reinforcement learning when there exists a dual LP formulation in terms of occupation measures that can be solved easily?

nbro
  • 39,006
  • 12
  • 98
  • 176
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Oct 13 '22 at 11:33
  • Can you please clarify what you mean by "dual LP formulation in terms of occupation measures". I know what linear programming is, but what do you mean by "occupation measures" and "dual LP formulation in terms of occupation measures"? – nbro Dec 31 '22 at 12:34
  • Occupation measure refers to an MDP's discounted state-action visitation frequency under a given policy. A detailed explanation of how this can be used for obtaining policies and the corresponding dual formulation, you can refer to Section 6.9 Linear Programming in the book Markov Decision Processes(1994) by Martin L. Puterman. – blackbird_h71 Jan 03 '23 at 05:17

1 Answers1

0

Policy gradient methods are popular in reinforcement learning because they are fast and easy to implement. Additionally, policy gradient methods often work well for simple problems. for example, if the Hellinger distance between two measures is small.

Dual LP methods may be more accurate if the problem is not too simple. However, they can be more difficult to implement and may require more computational resources. they are preferable if you have more information about the underlying distribution of the data.

Faizy
  • 1,074
  • 1
  • 6
  • 30