Questions tagged [markov-decision-process]

For questions related to the concept of Markov decision process (MDP), which is a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision-maker. The concept of MDP is useful for studying optimization problems solved via dynamic programming and reinforcement learning.

187 questions
21
votes
2 answers

How to define states in reinforcement learning?

I am studying reinforcement learning and the variants of it. I am starting to get an understanding of how the algorithms work and how they apply to an MDP. What I don't understand is the process of defining the states of the MDP. In most examples…
16
votes
1 answer

How to stay a up-to-date researcher in ML/RL community?

As a student who wants to work on machine learning, I would like to know how it is possible to start my studies and how to follow it to stay up-to-date. For example, I am willing to work on RL and MAB problems, but there are huge literatures on…
12
votes
2 answers

Is there a fundamental difference between an environment being stochastic and being partially observable?

In AI literature, deterministic vs stochastic and being fully-observable vs partially observable are usually considered two distinct properties of the environment. I'm confused about this because what appears random can be described by hidden…
10
votes
3 answers

How can you represent the state and action spaces for a card game in the case of a variable number of cards and actions?

I know how a machine can learn to play Atari games (Breakout): Playing Atari with Reinforcement Learning. With the same technique, it is even possible to play FPS games (Doom): Playing FPS Games with Reinforcement Learning. Further studies even…
10
votes
1 answer

Can Q-learning be used in a POMDP?

Can Q-learning (and SARSA) be directly used in a Partially Observable Markov Decision Process (POMDP)? If not, why not? My intuition is that the policies learned will be terrible because of partial observability. Are there ways to transform these…
9
votes
1 answer

What are some resources on continuous state and action spaces MDPs for reinforcement learning?

Most introductions to the field of MDPs and Reinforcement learning focus exclusively on domains where space and action variables are integers (and finite). This way we are introduced quickly to Value Iteration, Q-Learning, and the like. However, the…
8
votes
1 answer

How to fill in missing transitions when sampling an MDP transition table?

I have a simulator modelling a relatively complex scenario. I extract ~12 discrete features from the simulator state which forms the basis for my MDP state space. Suppose I am estimating the transition table for an MDP by running a large number of…
8
votes
1 answer

What is ergodicity in a Markov Decision Process (MDP)?

I have read about the concept of ergodicity on the safe RL paper by Moldovan (section 3.2) and the RL book by Sutton (chapter 10.3, 2nd paragraph). The first one says that "a belief over MDPs is ergodic if and only if any state is reachable from any…
7
votes
2 answers

What is a time-step in a Markov Decision Process?

The "discounted sum of future rewards" (or return) using discount factor $\gamma$ is $$\gamma^1 r_1 +\gamma^2 r_2 + \gamma^3 r_2 + \dots \tag{1}\label{1}$$ where $r_i$ is the reward received at the $i$th time-step. I am confused as to what…
6
votes
2 answers

When is the Markov decision process not adequate for goal-directed learning tasks?

In the book Reinforcement Learning: An Introduction (Sutton and Barto, 2018). The authors ask Exercise 3.2: Is the MDP framework adequate to usefully represent all goal-directed learning tasks? Can you think of any clear exceptions? I thought…
6
votes
0 answers

Proof that there always exists a dominating policy in an MDP

I think that it is common knowledge that for any infinite horizon discounted MDP $(S, A, P, r, \gamma)$, there always exists a dominating policy $\pi$, i.e. a policy $\pi$ such that for all policies $\pi'$: $$V_\pi (s) \geq V_{\pi'}(s) \quad…
6
votes
1 answer

Reward interpolation between MDPs. Will an optimal policy on both ends stay optimal inside the interval?

Say I've got two Markov Decision Processes (MDPs): $$\mathcal{M_0} = (\mathcal{S}, \mathcal{A}, P, R_0),\quad\text{and}\quad\mathcal{M}_1 = (\mathcal{S}, \mathcal{A}, P, R_1)$$ Both have the same set of states and actions, and the transition…
6
votes
1 answer

What techniques are used to make MDP discrete state space manageable?

Generating a discretized state space for an MDP (Markov Decision Process) model seems to suffer from the curse of dimensionality. Supposed my state has a few simple features: Feeling: Happy/Neutral/Sad Feeling: Hungry/Neither/Full Food left:…
6
votes
1 answer

Interesting examples of discrete stochastic games

SGs are a generalization of MDPs to multiple agents. Like this previous question on MDPs, are there any interesting examples of zero-sum, discrete SGs—preferably with small state and action spaces? I'm hoping to use such examples as benchmarks, but…
6
votes
2 answers

Are perfect and imperfect information games modelled as fully and partially observable environments, respectively?

In perfect information games, the agent can see all the moves performed in the past. Besides, it can observe the next action that will be put into practice by the opponent. In this case, can we say that perfect information games are actually a…
1
2 3
12 13