Highest Voted 'markov-decision-process' Questions - Artificial Intelligence Stack Exchange

21

votes

2 answers

How to define states in reinforcement learning?

I am studying reinforcement learning and the variants of it. I am starting to get an understanding of how the algorithms work and how they apply to an MDP. What I don't understand is the process of defining the states of the MDP. In most examples…

asked Aug 30 '18 at 23:45

Andy

313
1
2
6

16

votes

1 answer

How to stay a up-to-date researcher in ML/RL community?

As a student who wants to work on machine learning, I would like to know how it is possible to start my studies and how to follow it to stay up-to-date. For example, I am willing to work on RL and MAB problems, but there are huge literatures on…

machine-learning reinforcement-learning research markov-decision-process

asked Jul 18 '19 at 11:54

Amin

471
2
11

12

votes

2 answers

Is there a fundamental difference between an environment being stochastic and being partially observable?

In AI literature, deterministic vs stochastic and being fully-observable vs partially observable are usually considered two distinct properties of the environment. I'm confused about this because what appears random can be described by hidden…

comparison markov-decision-process environment pomdp

asked Dec 24 '21 at 01:31

martinkunev

233
1
7

10

votes

3 answers

How can you represent the state and action spaces for a card game in the case of a variable number of cards and actions?

I know how a machine can learn to play Atari games (Breakout): Playing Atari with Reinforcement Learning. With the same technique, it is even possible to play FPS games (Doom): Playing FPS Games with Reinforcement Learning. Further studies even…

reinforcement-learning game-ai deep-rl reference-request markov-decision-process

asked Oct 26 '16 at 08:11

Stefe Klauou

201
2
7

10

votes

1 answer

Can Q-learning be used in a POMDP?

Can Q-learning (and SARSA) be directly used in a Partially Observable Markov Decision Process (POMDP)? If not, why not? My intuition is that the policies learned will be terrible because of partial observability. Are there ways to transform these…

reinforcement-learning q-learning pomdp markov-decision-process sarsa

asked Apr 03 '19 at 02:40

drerD

298
2
6

9

votes

1 answer

What are some resources on continuous state and action spaces MDPs for reinforcement learning?

Most introductions to the field of MDPs and Reinforcement learning focus exclusively on domains where space and action variables are integers (and finite). This way we are introduced quickly to Value Iteration, Q-Learning, and the like. However, the…

reinforcement-learning reference-request research markov-decision-process

asked Aug 24 '16 at 10:00

CarrKnight

201
1
6

8

votes

1 answer

How to fill in missing transitions when sampling an MDP transition table?

I have a simulator modelling a relatively complex scenario. I extract ~12 discrete features from the simulator state which forms the basis for my MDP state space. Suppose I am estimating the transition table for an MDP by running a large number of…

reinforcement-learning markov-decision-process monte-carlo-methods transition-model

asked Jan 29 '17 at 22:11

Brendan Hill

263
1
6

8

votes

1 answer

What is ergodicity in a Markov Decision Process (MDP)?

I have read about the concept of ergodicity on the safe RL paper by Moldovan (section 3.2) and the RL book by Sutton (chapter 10.3, 2nd paragraph). The first one says that "a belief over MDPs is ergodic if and only if any state is reachable from any…

reinforcement-learning definitions markov-decision-process markov-chain ergodicity

asked Apr 07 '21 at 16:05

josealeixo.pc

205
1
6

7

votes

2 answers

What is a time-step in a Markov Decision Process?

The "discounted sum of future rewards" (or return) using discount factor $\gamma$ is $$\gamma^1 r_1 +\gamma^2 r_2 + \gamma^3 r_2 + \dots \tag{1}\label{1}$$ where $r_i$ is the reward received at the $i$th time-step. I am confused as to what…

reinforcement-learning terminology markov-decision-process return time-step

asked Oct 27 '16 at 19:57

Abhishek Bhatia

427
2
5
15

6

votes

2 answers

When is the Markov decision process not adequate for goal-directed learning tasks?

In the book Reinforcement Learning: An Introduction (Sutton and Barto, 2018). The authors ask Exercise 3.2: Is the MDP framework adequate to usefully represent all goal-directed learning tasks? Can you think of any clear exceptions? I thought…

reinforcement-learning applications markov-decision-process

asked Oct 18 '18 at 13:36

MrYouMath

255
1
7

6

votes

0 answers

Proof that there always exists a dominating policy in an MDP

I think that it is common knowledge that for any infinite horizon discounted MDP $(S, A, P, r, \gamma)$, there always exists a dominating policy $\pi$, i.e. a policy $\pi$ such that for all policies $\pi'$: $$V_\pi (s) \geq V_{\pi'}(s) \quad…

reinforcement-learning markov-decision-process proofs policies

asked Jul 16 '21 at 16:21

MMM

185
3

6

votes

1 answer

Reward interpolation between MDPs. Will an optimal policy on both ends stay optimal inside the interval?

Say I've got two Markov Decision Processes (MDPs): $$\mathcal{M_0} = (\mathcal{S}, \mathcal{A}, P, R_0),\quad\text{and}\quad\mathcal{M}_1 = (\mathcal{S}, \mathcal{A}, P, R_1)$$ Both have the same set of states and actions, and the transition…

markov-decision-process rewards reward-shaping interpolation

asked May 21 '21 at 22:32

Kostya

2,416
7
23

6

votes

1 answer

What techniques are used to make MDP discrete state space manageable?

Generating a discretized state space for an MDP (Markov Decision Process) model seems to suffer from the curse of dimensionality. Supposed my state has a few simple features: Feeling: Happy/Neutral/Sad Feeling: Hungry/Neither/Full Food left:…

reinforcement-learning markov-decision-process state-spaces continuous-action-spaces discrete-action-spaces

asked Dec 22 '16 at 01:35

Brendan Hill

263
1
6

6

votes

1 answer

Interesting examples of discrete stochastic games

SGs are a generalization of MDPs to multiple agents. Like this previous question on MDPs, are there any interesting examples of zero-sum, discrete SGs—preferably with small state and action spaces? I'm hoping to use such examples as benchmarks, but…

game-theory environment markov-decision-process benchmarks

asked Dec 23 '19 at 20:12

user76284

347
1
14

6

votes

2 answers

Are perfect and imperfect information games modelled as fully and partially observable environments, respectively?

In perfect information games, the agent can see all the moves performed in the past. Besides, it can observe the next action that will be put into practice by the opponent. In this case, can we say that perfect information games are actually a…

comparison markov-decision-process game-theory pomdp imperfect-information

asked Nov 09 '19 at 09:14

Goktug

161
2

Questions tagged [markov-decision-process]