Questions tagged [stationary-policy]

For questions related to the concept of a stationary policy (in reinforcement learning and other AI sub-fields).

5 questions
15
votes
4 answers

What does "stationary" mean in the context of reinforcement learning?

I think I've seen the expressions "stationary data", "stationary dynamics" and "stationary policy", among others, in the context of reinforcement learning. What does it mean? I think stationary policy means that the policy does not depend on time,…
8
votes
1 answer

What is the difference between a stationary and a non-stationary policy?

In reinforcement learning, there are deterministic and non-deterministic (or stochastic) policies, but there are also stationary and non-stationary policies. What is the difference between a stationary and a non-stationary policy? How do you…
nbro
  • 39,006
  • 12
  • 98
  • 176
2
votes
0 answers

Should I use the discounted average reward as objective in a finite-horizon problem?

I am new to reinforcement learning, but, for a finite horizon application problem, I am considering using the average reward instead of the sum of rewards as the objective. Specifically, there are a total of $T$ maximally possible time steps (e.g.,…
1
vote
0 answers

Why do bootstrapping methods produce nonstationary targets more than non-bootstrapping methods?

The following quote is taken from the beginning of the chapter on "Approximate Solution Methods" (p. 198) in "Reinforcement Learning" by Sutton & Barto (2018): reinforcement learning generally requires function approximation methods able to handle…
1
vote
1 answer

What is the difference between the definition of a stationary policy in reinforcement learning and contextual bandit?

A stationary policy is a function that maps a state to a probability distribution of actions. In a contextual bandit problem, a state itself does not include the history. But in a reinforcement learning problem, the history can be used to define a…