Questions tagged [semi-mdp]

For questions related to semi-Markov Decision Processes (SMDP).

For more info, have a look at the paper "Between MDPs and semi-MDPs: A framework for Temporal Abstraction in Reinforcement Learning" (by Richard S.Sutton, Doina Precup and Satinder Singh).

8 questions
7
votes
1 answer

What are options in reinforcement learning?

According to a lecture (week 10) about Reinforcement Learning [1], the concept of an option allows searching the state space of an agent much faster. The lecture was hard to follow because many new terms were introduced in a short time. For me, the…
user11571
5
votes
1 answer

Should I model my problem as a semi-MDP?

I have a system (like a bank) that people (customers) are entered into the systems by a Poisson process, so the time between the arrival of people (two consecutive customers) will be a random variable. The state of the problem is related to just the…
4
votes
1 answer

How to apply or extend the $Q(\lambda)$ algorithm to semi-MDPs?

I want to model an SMDP such that time is discretized and the transition time between the two states follows an exponential distribution and there would be no reward between the transition. Can I know what are the differences between $Q(\lambda)$…
3
votes
2 answers

How to model a multi-agent reinforcement learning problem where actions of different agents can take different durations?

I am confused on a conceptual scale how I would be able to model a multi-agent reinforcement learning problem when each agent performing an action would take different durations to complete the action. This means that a certain action is performed…
3
votes
0 answers

Relationship between the reward rate and the sampled reward in a Semi-Markov Decision Process

In the paper: Reinforcement learning methods for continuous-time Markov decision problems, the authors provide the following update rule for the Q-learning algorithm, when applied to Semi-Markov Decision Processes (SMDPs): $Q^{(k+1)}(x,a) =…
2
votes
1 answer

Is my understanding of the differences between MDP, Semi MDP and POMDP correct?

I just wanted to confirm that my understanding of the different Markov Decision Processes are correct, because they are the fundamentals of reinforcement learning. Also, I read a few literature sources, and some are not consistent with each other.…
2
votes
1 answer

Updating action-value functions in Semi-Markov Decision Process and Reinforcement Learning

Suppose that the transition time between two states is a random variable (for example, unknown exponential distribution); and between two arrivals, there is no reward. If $\tau$ (real number not an integer number) shows the time between two…
1
vote
1 answer

Bellman optimality equation in semi Markov decision process

I wrote a Python program for a simple inventory control problem where decision epochs are equally divided (every morning) and there is no lead time for orders (the time between submitting an order until receiving the order). I use the Bellman…