Questions tagged [markov-property]

For questions related to the Markov property or Markov assumption (that is, the assumption that the "future is independent of the past, given the present"), which underlies e.g. most reinforcement learning algorithms.

21 questions
5
votes
1 answer

Why does TD Learning require Markovian domains?

One of my friends and I were discussing the differences between Dynamic Programming, Monte-Carlo, and Temporal Difference (TD) Learning as policy evaluation methods - and we agreed on the fact that Dynamic Programming requires the Markov assumption…
4
votes
1 answer

How is the Markovian property consistent in reinforcement learning based scheduling?

In Reinforcement Learning, an MDP model incorporates the Markovian property. A lot of scheduling applications in a lot of disciplines use reinforcement learning (mostly deep RL) to learn scheduling decisions. For example, the paper Learning…
4
votes
1 answer

Is a neural network able to optimize itself for speed?

I am experimenting with OpenAI Gym and reinforcement learning. As far as I understood, the environment is waiting for the agent to make a decision, so it's a sequential operation like this: decision = agent.decide(state) state, reward, done =…
4
votes
1 answer

How to assign rewards in a non-Markovian environment?

I am quite new to the Reinforcement Learning domain and I am curious about something. It seems to be the case that the majority of current research assumes Markovian environments, that is, future states of the process depend only upon the present…
4
votes
0 answers

What research has been done on learning non-Markovian reward functions?

Recently, some work has been done planning and learning in Non-Markovian Decision Processes, that is, decision-making with temporally extended rewards. In these settings, a particular reward is received only when a particular temporal logic formula…
3
votes
1 answer

Why can we take the action $a$ from the next state $s'$ in the max part of the Q-learning update rule, if that action doesn't lead to any reward?

I'm using OpenAI's cartpole environment. First of all, is this environment not Markov? Knowing that, my main question concerns Q-learning and off-policy methods: For me, there is something weird in updating a Q value based on the max Q for a state…
3
votes
2 answers

Reinforcement Learning algorithm with rewards dependent both on previous action and current action

Problem description: Suppose we have an environment, where a reward at time step $t$ is dependent not only on the current action, but also on previous action in the following way: if current action == previous action, you get reward = $R(a,s)$ if…
3
votes
0 answers

Policy gradient: Does it use the Markov property?

To derive the policy gradient, we start by writing the equation for the probability of a certain trajectory (e.g. see spinningup tutorial): $$ \begin{align} P_\theta(\tau) &= P_\theta(s_0, a_0, s_1, a_1, \dots, s_T, a_T) \\ & = p(s_0) \prod_{i=0}^T…
3
votes
1 answer

What does the Markov assumption say about the history of state sequences?

Does the Markov assumption say that the conditional probability of the next state only depends on the current state or does it say that the conditional probability depends on a fixed finite number of previous states? As far as I understand from the…
3
votes
2 answers

Can non-Markov environments also be deterministic?

The definition of deterministic environment I am familiar with goes as follows: The next state of the agent depends only on the current state and the action chosen by the agent. By exclusion, everything else would be a stochastic…
2
votes
0 answers

Can $Q$-learning or SARSA be thought of a Markov Chain?

I might just be overthinking a very simple question but nonetheless the following has been bugging me a lot. Given an MDP with non-trivial state and action sets, we can implement the SARSA algorithm to find the optimal policy or the optimal…
2
votes
1 answer

Is the Markov property assumed in the forward algorithm?

I'm majoring in pure linguistics (not computational), and I don't have any basic knowledge regarding computational science or mathematics. But I happen to take the "Automatic Speech Recognition" course in my graduate school and struggling with it. I…
1
vote
0 answers

What limitations does the Markov property place on real time learning?

The Markov property is the dependence of a system's future state probability distribution solely on the present state, excluding any dependence on past system history. The presence of the Markov property saves computing resource requirements in…
1
vote
0 answers

Implementation of MDP in python to determine when to take action clean

I am trying to model the following problem as a Markov decision process. In a steel melting shop of a steel plant, iron pipes are used. These pipes generate rust over time. Adding an anti-rusting solution can delay the rusting process. If there is…
shan
  • 111
  • 2
1
vote
1 answer

What is the difference between environment states and agent states in terms of Markov property?

I'm going through the David Silver RL course on YouTube. He talks about environment internal state $S^e_t$, and agent internal state $S^a_t$. We know that state $s$ is Markov…
1
2