8

In reinforcement learning, there are deterministic and non-deterministic (or stochastic) policies, but there are also stationary and non-stationary policies.

What is the difference between a stationary and a non-stationary policy? How do you formalize both? Which problems (or environments) require a stationary policy as opposed to a non-stationary one (and vice-versa)?

nbro
  • 39,006
  • 12
  • 98
  • 176
  • [This Wikipedia article](https://en.wikipedia.org/wiki/Stationary_process) should also be useful. – nbro Jan 21 '21 at 15:22

1 Answers1

6

A stationary policy, $\pi_t$, is a policy that does not change over time, that is, $\pi_t = \pi, \forall t \geq 0$, where $\pi$ can either be a function, $\pi: S \rightarrow A$ (a deterministic policy), or a conditional density, $\pi(A \mid S)$ (a stochastic policy). A non-stationary policy is a policy that is not stationary. More precisely, $\pi_i$ may not be equal to $\pi_j$, for $i \neq j \geq 0$, where $i$ and $j$ are thus two different time steps.

There are problems where a stationary optimal policy is guaranteed to exist. For example, in the case of a stochastic (there is a probability density that models the dynamics of the environment, that is, the transition function and the reward function) and discrete-time Markov decision process (MDP) with finite numbers of states and actions, and bounded rewards, where the objective is the long-run average reward, a stationary optimal policy exists. The proof of this fact is in the book Markov Decision Processes: Discrete Stochastic Dynamic Programming (1994), by Martin L. Puterman, which apparently is not freely available on the web.

David
  • 4,591
  • 1
  • 6
  • 25
nbro
  • 39,006
  • 12
  • 98
  • 176
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/116369/discussion-on-answer-by-nbro-what-is-the-difference-between-a-stationary-and-a-n). – nbro Nov 18 '20 at 14:17