Highest Voted 'finite-markov-decision-process' Questions - Artificial Intelligence Stack Exchange

3

votes

2 answers

Is Monte Carlo Tree Search appropriate for problems with large state and action spaces?

I'm doing a research on a finite-horizon Markov decision process with $t=1, \dots, 40$ periods. In every time step $t$, the (only) agent has to chose an action $a(t) \in A(t)$, while the agent is in state $s(t) \in S(t)$. The chosen action $a(t)$ in…

asked Jan 09 '19 at 12:25

D. B.

101
6

2

votes

1 answer

Continuous state and continuous action Markov decision process time complexity estimate: backward induction VS policy gradient method (RL)

Model Description: Model based(assume known of the entire model) Markov decision process. Time($t$): Finite horizon discrete time with discounting factor State($x_t$): Continuous multi-dimensional state Action($a_t$): Continuous multi-dimensional…

reinforcement-learning optimization time-complexity finite-markov-decision-process interpolation

asked Jun 22 '20 at 20:51

leodongxu

21
2

1

vote

0 answers

How to generalize finite MDP to general MDP?

Suppose, for simplicity sake, to be in a discrete time domain with the action set being the same for all states $S \in \mathcal{S}$. Thus, in a finite Markov Decision Process, the sets $\mathcal{A}$, $\mathcal{S}$, and $\mathcal{R}$ have a finite…

markov-decision-process continuous-action-spaces transition-model finite-markov-decision-process continuous-state-spaces

asked Nov 23 '18 at 13:25

gvgramazio

696
2
7
19

1

vote

0 answers

Recursive Least squares (RLS) for mini batch

For my application I am considering a learning problem where I simulate a bunch of episodes say '$n$' first, and than carry out the recursive least squares update. Similar to $TD(1)$. I know that RLS can be used to update parameters being learned as…

reinforcement-learning function-approximation weights temporal-difference-methods finite-markov-decision-process

asked Jul 27 '21 at 06:17

Prakash Gawas

11
1

0

votes

1 answer

How to formulate discounted return in cartpole?

I am trying to formulate a problem that aims to prolong the lifetime of the simulation, the same as the Cartpole problem. I aware that there are two types of return: finite horizon undiscounted return (used for episodic problems) $G = \sum_{t=0}^T…

reinforcement-learning deep-rl policy-gradients return finite-markov-decision-process

asked May 29 '21 at 10:52

Ngoc Bui

3
1

0

votes

1 answer

Converging to a wrong optimal policy if the agent is given more choices

I am a bit new to Reinforcement learning. So, I am extremely sorry if I am asking something obvious. I have written a small piece of code to find the optimal policy for a 5x5 grid problem. Scenario 1. The agent is only given two choices (Up,…

reinforcement-learning markov-decision-process bellman-equations policy-iteration finite-markov-decision-process

asked May 21 '21 at 13:22

Tyrion

3
2

Questions tagged [finite-markov-decision-process]

Is Monte Carlo Tree Search appropriate for problems with large state and action spaces?

Continuous state and continuous action Markov decision process time complexity estimate: backward induction VS policy gradient method (RL)

How to generalize finite MDP to general MDP?

Recursive Least squares (RLS) for mini batch

How to formulate discounted return in cartpole?

Converging to a wrong optimal policy if the agent is given more choices