Highest Voted 'markov-property' Questions - Artificial Intelligence Stack Exchange

5

votes

1 answer

Why does TD Learning require Markovian domains?

One of my friends and I were discussing the differences between Dynamic Programming, Monte-Carlo, and Temporal Difference (TD) Learning as policy evaluation methods - and we agreed on the fact that Dynamic Programming requires the Markov assumption…

asked Aug 07 '20 at 05:19

stoic-santiago

1,121
5
18

4

votes

1 answer

How is the Markovian property consistent in reinforcement learning based scheduling?

In Reinforcement Learning, an MDP model incorporates the Markovian property. A lot of scheduling applications in a lot of disciplines use reinforcement learning (mostly deep RL) to learn scheduling decisions. For example, the paper Learning…

reinforcement-learning markov-decision-process markov-property

asked Mar 10 '20 at 16:21

ephemeral

143
4

4

votes

1 answer

Is a neural network able to optimize itself for speed?

I am experimenting with OpenAI Gym and reinforcement learning. As far as I understood, the environment is waiting for the agent to make a decision, so it's a sequential operation like this: decision = agent.decide(state) state, reward, done =…

neural-networks reinforcement-learning open-ai markov-property

asked Jan 31 '20 at 08:14

Thomas Weller

221
3
9

4

votes

1 answer

How to assign rewards in a non-Markovian environment?

I am quite new to the Reinforcement Learning domain and I am curious about something. It seems to be the case that the majority of current research assumes Markovian environments, that is, future states of the process depend only upon the present…

reinforcement-learning rewards markov-decision-process environment markov-property

asked Oct 31 '19 at 16:15

thulungair

43
3

4

votes

0 answers

What research has been done on learning non-Markovian reward functions?

Recently, some work has been done planning and learning in Non-Markovian Decision Processes, that is, decision-making with temporally extended rewards. In these settings, a particular reward is received only when a particular temporal logic formula…

reinforcement-learning reference-request reward-functions markov-property

asked Apr 07 '19 at 17:45

Gavin Rens

41
3

3

votes

1 answer

Why can we take the action $a$ from the next state $s'$ in the max part of the Q-learning update rule, if that action doesn't lead to any reward?

I'm using OpenAI's cartpole environment. First of all, is this environment not Markov? Knowing that, my main question concerns Q-learning and off-policy methods: For me, there is something weird in updating a Q value based on the max Q for a state…

q-learning markov-decision-process off-policy-methods on-policy-methods markov-property

asked Feb 18 '21 at 17:20

JeanMi

155
4

3

votes

2 answers

Reinforcement Learning algorithm with rewards dependent both on previous action and current action

Problem description: Suppose we have an environment, where a reward at time step $t$ is dependent not only on the current action, but also on previous action in the following way: if current action == previous action, you get reward = $R(a,s)$ if…

reinforcement-learning proximal-policy-optimization reward-design state-spaces markov-property

asked Jan 26 '21 at 09:27

FQT

33
6

3

votes

0 answers

Policy gradient: Does it use the Markov property?

To derive the policy gradient, we start by writing the equation for the probability of a certain trajectory (e.g. see spinningup tutorial): $$ \begin{align} P_\theta(\tau) &= P_\theta(s_0, a_0, s_1, a_1, \dots, s_T, a_T) \\ & = p(s_0) \prod_{i=0}^T…

reinforcement-learning deep-rl policy-gradients markov-property

asked Dec 20 '20 at 23:37

Gerges

131
3

3

votes

1 answer

What does the Markov assumption say about the history of state sequences?

Does the Markov assumption say that the conditional probability of the next state only depends on the current state or does it say that the conditional probability depends on a fixed finite number of previous states? As far as I understand from the…

reinforcement-learning math definitions norvig-russell markov-property

asked Nov 20 '19 at 14:30

MScott

445
4
12

3

votes

2 answers

Can non-Markov environments also be deterministic?

The definition of deterministic environment I am familiar with goes as follows: The next state of the agent depends only on the current state and the action chosen by the agent. By exclusion, everything else would be a stochastic…

reinforcement-learning definitions environment markov-property

asked Jul 15 '19 at 20:52

user9007131

63
6

2

votes

0 answers

Can $Q$-learning or SARSA be thought of a Markov Chain?

I might just be overthinking a very simple question but nonetheless the following has been bugging me a lot. Given an MDP with non-trivial state and action sets, we can implement the SARSA algorithm to find the optimal policy or the optimal…

reinforcement-learning markov-decision-process sarsa markov-chain markov-property

asked Nov 17 '20 at 21:25

dezdichado

182
8

2

votes

1 answer

Is the Markov property assumed in the forward algorithm?

I'm majoring in pure linguistics (not computational), and I don't have any basic knowledge regarding computational science or mathematics. But I happen to take the "Automatic Speech Recognition" course in my graduate school and struggling with it. I…

machine-learning markov-property hidden-markov-model

asked Jun 16 '19 at 09:48

Jeeyoung Jeon

23
3

1

vote

0 answers

What limitations does the Markov property place on real time learning?

The Markov property is the dependence of a system's future state probability distribution solely on the present state, excluding any dependence on past system history. The presence of the Markov property saves computing resource requirements in…

reinforcement-learning markov-decision-process markov-property

asked Jan 12 '19 at 00:29

Douglas Daseeco

7,423
1
26
62

1

vote

0 answers

Implementation of MDP in python to determine when to take action clean

I am trying to model the following problem as a Markov decision process. In a steel melting shop of a steel plant, iron pipes are used. These pipes generate rust over time. Adding an anti-rusting solution can delay the rusting process. If there is…

python markov-decision-process markov-property

asked Oct 09 '21 at 20:29

shan

111
2

1

vote

1 answer

What is the difference between environment states and agent states in terms of Markov property?

I'm going through the David Silver RL course on YouTube. He talks about environment internal state $S^e_t$, and agent internal state $S^a_t$. We know that state $s$ is Markov…

reinforcement-learning terminology markov-decision-process state-spaces markov-property

asked Apr 06 '21 at 23:40

Stanko Kovacevic

13
3

Questions tagged [markov-property]