1

What does stationary state distribution mean in reinforcement learning?

The context it is used is as below.

enter image description here

DSPinfinity
  • 301
  • 1
  • 8
  • 1
    Could you give some context where you saw or heard this please? There are a couple of things it could mean, which depends whether we are talking about this as a trait of an environment, or a trait of a policy (the latter typically in a continuing/non-episodic environment). There may be other contexts that I don't know as well. – Neil Slater Jun 04 '23 at 19:17
  • 1
    I added the context it was used. I updated the question. – DSPinfinity Jun 06 '23 at 09:34

1 Answers1

1

In general a stationary distribution is a distribution over something (e.g. actions, states, etc) that doesn't change over time, for example.

Example of non-stationary distributions can be easily found in multi-agent RL, when modelling the policy of some other agent: in this case the opponent's policy can be non-stationary since it can adapt itself to the other agents, and so on. In such case, that distribution will change with time, or, better said, with novel gathered experience from the environment and other agents.

In you case the distribution $d_\pi(s)$ is stationary over states: giving a fixed policy $\pi$ (that doesn't change), the distribution of states that the policy visits does not change too, e.g. with time. In simpler words, $\pi$ always visits the same states at each rollout. As the next line in the text says "the stationary distribution varies with the learned policy", means that if you update $\pi$ then $d_\pi$ will change consequently assigning a diverse probability mass over $s$.

I think that $d_\pi$ is also related to the transition dynamics of the environment: maybe the text is implicitly assuming that neither the underlying environment will change with time, so that fixing both the env and policy the visited/reached states and their probability are the same across timesteps/transitions.

Luca Anzalone
  • 2,120
  • 2
  • 13
  • 1
    Worth mention ergodicity assumption and linking https://ai.stackexchange.com/questions/27196/what-is-ergodicity-in-a-markov-decision-process-mdp - which is what the quote from the question is about – Neil Slater Jun 06 '23 at 20:33