How should I implement the state transition when it is a Gaussian distribution?

Question

I am reading this paper Anxiety, Avoidance and Sequential Evaluation and is confused about the implementation of a specific lab study. Namely, the authors model what is called the Balloon task using a simple MDP for which the description is below:

My confusion is the following sentence:

...The probability of this bad transition was modeled using normal density function, with parameters $N(16, 0.5)$

But the fact that this is a continuous, normal distribution makes me stumped. In MDP's, usually there is a nice, discrete transition matrix and so there is no ambiguity as to how to implement it. For instance, if they said the transition to a bad state is modeled by a Bernoulli random variable with parameter $p,$ then it is clear how to implement it. I would do something like:

def step(curr_state, curr_action):
   if uniform random variable(0,1) < p:
      next_state = bad state

But they are using a normal random variable for this "bad" transition, so how do I implement this?

score 0 · Accepted Answer · answered Feb 08 '21 at 00:14

0

I figured this out by going to the author's publicly available github code. It turned out the authors were just generating the transition probability $p$ from $\mathcal{N}(\mu,\sigma^2)$ at the beginning of each episode for some reason. Answering it myself for the sake of not leaving this question unanswered.

answered Feb 08 '21 at 00:14

dezdichado

182
8

Wait, but $p$ needs to be limited to $[0,1]$, if it's a probability. However, the Gaussian is not limited to $[0, 1]$. Maybe you could also provide a link to the source code. – nbro Feb 08 '21 at 00:34
@nbro I misworded it - the maximum number of episodes is $T = 20$ and the balloon explodes if the current the number of steps have taken so far $t$ exceeds the normal random variable drawn from $\mathcal{N}(16, 0.5),$ for the low-risk version of the task and $\mathcal{N}(8, 0.5)$ for the high-risk version. – dezdichado Feb 08 '21 at 00:51

How should I implement the state transition when it is a Gaussian distribution?

1 Answers1