Questions tagged [finite-markov-decision-process]
6 questions
3
votes
2 answers
Is Monte Carlo Tree Search appropriate for problems with large state and action spaces?
I'm doing a research on a finite-horizon Markov decision process with $t=1, \dots, 40$ periods. In every time step $t$, the (only) agent has to chose an action $a(t) \in A(t)$, while the agent is in state $s(t) \in S(t)$. The chosen action $a(t)$ in…

D. B.
- 101
- 6
2
votes
1 answer
Continuous state and continuous action Markov decision process time complexity estimate: backward induction VS policy gradient method (RL)
Model Description: Model based(assume known of the entire model) Markov decision process.
Time($t$): Finite horizon discrete time with discounting factor
State($x_t$): Continuous multi-dimensional state
Action($a_t$): Continuous multi-dimensional…

leodongxu
- 21
- 2
1
vote
0 answers
How to generalize finite MDP to general MDP?
Suppose, for simplicity sake, to be in a discrete time domain with the action set being the same for all states $S \in \mathcal{S}$. Thus, in a finite Markov Decision Process, the sets $\mathcal{A}$, $\mathcal{S}$, and $\mathcal{R}$ have a finite…

gvgramazio
- 696
- 2
- 7
- 19
1
vote
0 answers
Recursive Least squares (RLS) for mini batch
For my application I am considering a learning problem where I simulate a bunch of episodes say '$n$' first, and than carry out the recursive least squares update. Similar to $TD(1)$.
I know that RLS can be used to update parameters being learned as…

Prakash Gawas
- 11
- 1
0
votes
1 answer
How to formulate discounted return in cartpole?
I am trying to formulate a problem that aims to prolong the lifetime of the simulation, the same as the Cartpole problem. I aware that there are two types of return:
finite horizon undiscounted return (used for episodic problems)
$G = \sum_{t=0}^T…

Ngoc Bui
- 3
- 1
0
votes
1 answer
Converging to a wrong optimal policy if the agent is given more choices
I am a bit new to Reinforcement learning. So, I am extremely sorry if I am asking something obvious. I have written a small piece of code to find the optimal policy for a 5x5 grid problem.
Scenario 1. The agent is only given two choices (Up,…

Tyrion
- 3
- 2