For questions related to the Markov property or Markov assumption (that is, the assumption that the "future is independent of the past, given the present"), which underlies e.g. most reinforcement learning algorithms.
Questions tagged [markov-property]
21 questions
5
votes
1 answer
Why does TD Learning require Markovian domains?
One of my friends and I were discussing the differences between Dynamic Programming, Monte-Carlo, and Temporal Difference (TD) Learning as policy evaluation methods - and we agreed on the fact that Dynamic Programming requires the Markov assumption…

stoic-santiago
- 1,121
- 5
- 18
4
votes
1 answer
How is the Markovian property consistent in reinforcement learning based scheduling?
In Reinforcement Learning, an MDP model incorporates the Markovian property. A lot of scheduling applications in a lot of disciplines use reinforcement learning (mostly deep RL) to learn scheduling decisions. For example, the paper Learning…

ephemeral
- 143
- 4
4
votes
1 answer
Is a neural network able to optimize itself for speed?
I am experimenting with OpenAI Gym and reinforcement learning. As far as I understood, the environment is waiting for the agent to make a decision, so it's a sequential operation like this:
decision = agent.decide(state)
state, reward, done =…

Thomas Weller
- 221
- 3
- 9
4
votes
1 answer
How to assign rewards in a non-Markovian environment?
I am quite new to the Reinforcement Learning domain and I am curious about something. It seems to be the case that the majority of current research assumes Markovian environments, that is, future states of the process depend only upon the present…

thulungair
- 43
- 3
4
votes
0 answers
What research has been done on learning non-Markovian reward functions?
Recently, some work has been done planning and learning in Non-Markovian Decision Processes, that is, decision-making with temporally extended rewards. In these settings, a particular reward is received only when a particular temporal logic formula…

Gavin Rens
- 41
- 3
3
votes
1 answer
Why can we take the action $a$ from the next state $s'$ in the max part of the Q-learning update rule, if that action doesn't lead to any reward?
I'm using OpenAI's cartpole environment. First of all, is this environment not Markov?
Knowing that, my main question concerns Q-learning and off-policy methods:
For me, there is something weird in updating a Q value based on the max Q for a state…

JeanMi
- 155
- 4
3
votes
2 answers
Reinforcement Learning algorithm with rewards dependent both on previous action and current action
Problem description:
Suppose we have an environment, where a reward at time step $t$ is dependent not only on the current action, but also on previous action in the following way:
if current action == previous action, you get reward = $R(a,s)$
if…

FQT
- 33
- 6
3
votes
0 answers
Policy gradient: Does it use the Markov property?
To derive the policy gradient, we start by writing the equation for the probability of a certain trajectory (e.g. see spinningup tutorial):
$$
\begin{align}
P_\theta(\tau) &= P_\theta(s_0, a_0, s_1, a_1, \dots, s_T, a_T) \\
& = p(s_0) \prod_{i=0}^T…

Gerges
- 131
- 3
3
votes
1 answer
What does the Markov assumption say about the history of state sequences?
Does the Markov assumption say that the conditional probability of the next state only depends on the current state or does it say that the conditional probability depends on a fixed finite number of previous states?
As far as I understand from the…

MScott
- 445
- 4
- 12
3
votes
2 answers
Can non-Markov environments also be deterministic?
The definition of deterministic environment I am familiar with goes as follows:
The next state of the agent depends only on the current state and the action chosen by the agent.
By exclusion, everything else would be a stochastic…

user9007131
- 63
- 6
2
votes
0 answers
Can $Q$-learning or SARSA be thought of a Markov Chain?
I might just be overthinking a very simple question but nonetheless the following has been bugging me a lot.
Given an MDP with non-trivial state and action sets, we can implement the SARSA algorithm to find the optimal policy or the optimal…

dezdichado
- 182
- 8
2
votes
1 answer
Is the Markov property assumed in the forward algorithm?
I'm majoring in pure linguistics (not computational), and I don't have any basic knowledge regarding computational science or mathematics. But I happen to take the "Automatic Speech Recognition" course in my graduate school and struggling with it.
I…

Jeeyoung Jeon
- 23
- 3
1
vote
0 answers
What limitations does the Markov property place on real time learning?
The Markov property is the dependence of a system's future state probability distribution solely on the present state, excluding any dependence on past system history.
The presence of the Markov property saves computing resource requirements in…

Douglas Daseeco
- 7,423
- 1
- 26
- 62
1
vote
0 answers
Implementation of MDP in python to determine when to take action clean
I am trying to model the following problem as a Markov decision process.
In a steel melting shop of a steel plant, iron pipes are used. These pipes generate rust over time. Adding an anti-rusting solution can delay the rusting process. If there is…

shan
- 111
- 2
1
vote
1 answer
What is the difference between environment states and agent states in terms of Markov property?
I'm going through the David Silver RL course on YouTube. He talks about environment internal state $S^e_t$, and agent internal state $S^a_t$.
We know that state $s$ is Markov…

Stanko Kovacevic
- 13
- 3