Questions tagged [finite-markov-decision-process]

6 questions
3
votes
2 answers

Is Monte Carlo Tree Search appropriate for problems with large state and action spaces?

I'm doing a research on a finite-horizon Markov decision process with $t=1, \dots, 40$ periods. In every time step $t$, the (only) agent has to chose an action $a(t) \in A(t)$, while the agent is in state $s(t) \in S(t)$. The chosen action $a(t)$ in…
2
votes
1 answer

Continuous state and continuous action Markov decision process time complexity estimate: backward induction VS policy gradient method (RL)

Model Description: Model based(assume known of the entire model) Markov decision process. Time($t$): Finite horizon discrete time with discounting factor State($x_t$): Continuous multi-dimensional state Action($a_t$): Continuous multi-dimensional…
1
vote
0 answers

How to generalize finite MDP to general MDP?

Suppose, for simplicity sake, to be in a discrete time domain with the action set being the same for all states $S \in \mathcal{S}$. Thus, in a finite Markov Decision Process, the sets $\mathcal{A}$, $\mathcal{S}$, and $\mathcal{R}$ have a finite…
1
vote
0 answers

Recursive Least squares (RLS) for mini batch

For my application I am considering a learning problem where I simulate a bunch of episodes say '$n$' first, and than carry out the recursive least squares update. Similar to $TD(1)$. I know that RLS can be used to update parameters being learned as…
0
votes
1 answer

How to formulate discounted return in cartpole?

I am trying to formulate a problem that aims to prolong the lifetime of the simulation, the same as the Cartpole problem. I aware that there are two types of return: finite horizon undiscounted return (used for episodic problems) $G = \sum_{t=0}^T…
0
votes
1 answer

Converging to a wrong optimal policy if the agent is given more choices

I am a bit new to Reinforcement learning. So, I am extremely sorry if I am asking something obvious. I have written a small piece of code to find the optimal policy for a 5x5 grid problem. Scenario 1. The agent is only given two choices (Up,…