Can an Markov decision process be dependent on the past?

Question

As far as I know MDP are independent from the past. But the definition says that the same policy should always take the same action depending on the state.

What if I define my state as the current "main" state + previous decisions?

For Example in Poker the "main" state would be my cards and the pot + all previous information about the game.

Would this still be a MDP or not?

Yeah, it's possible. If we can abstract past state into current state then MDP will still holds good. — rai.skumar, Jan 17 '18 at 05:51
But the definition says that the same policy should always take the same action depending on the state. --> this seems not right, considering stochastic policies, such as epsilon-greedy. — heyzude, May 24 '22 at 05:46

score 3 · Answer 1 · answered Mar 11 '17 at 20:38

3

It's not totally clear from your description, but it sounds like you may be onto something like an Additive Markov Chain.

answered Mar 11 '17 at 20:38

mindcrime

3,737
14
29

score 3 · Accepted Answer · edited Jun 17 '20 at 09:57

MPD are not independent from the past, but future actions starting from current state are independent from the past i.e. the probability of the next state given all previous states is the same as the probability of the next state given the previous state.

Any state representation which is composed by the full history is a MDP, because looking at the history (coded in your state) is not the same as looking back previous states, so Markov property holds. The problem here is that you will have an explosion of states, since you need to code in the state any possible trajectory, and it is unfeasible most of the times.

What if I define my state as the current "main" state + previous decisions?

For Example in Poker the "main" state would be my cards and the pot + all previous information about the game.

Yes it is a Markov Decision Problem.

Can an Markov decision process be dependent on the past?

2 Answers2