4

I am reading this paper Anxiety, Avoidance and Sequential Evaluation and is confused about the implementation of a specific lab study. Namely, the authors model what is called the Balloon task using a simple MDP for which the description is below:

enter image description here

My confusion is the following sentence:

...The probability of this bad transition was modeled using normal density function, with parameters $N(16, 0.5)$

But the fact that this is a continuous, normal distribution makes me stumped. In MDP's, usually there is a nice, discrete transition matrix and so there is no ambiguity as to how to implement it. For instance, if they said the transition to a bad state is modeled by a Bernoulli random variable with parameter $p,$ then it is clear how to implement it. I would do something like:

def step(curr_state, curr_action):
   if uniform random variable(0,1) < p:
      next_state = bad state

But they are using a normal random variable for this "bad" transition, so how do I implement this?

nbro
  • 39,006
  • 12
  • 98
  • 176
dezdichado
  • 182
  • 8

1 Answers1

0

I figured this out by going to the author's publicly available github code. It turned out the authors were just generating the transition probability $p$ from $\mathcal{N}(\mu,\sigma^2)$ at the beginning of each episode for some reason. Answering it myself for the sake of not leaving this question unanswered.

dezdichado
  • 182
  • 8
  • Wait, but $p$ needs to be limited to $[0,1]$, if it's a probability. However, the Gaussian is not limited to $[0, 1]$. Maybe you could also provide a link to the source code. – nbro Feb 08 '21 at 00:34
  • @nbro I misworded it - the maximum number of episodes is $T = 20$ and the balloon explodes if the current the number of steps have taken so far $t$ exceeds the normal random variable drawn from $\mathcal{N}(16, 0.5),$ for the low-risk version of the task and $\mathcal{N}(8, 0.5)$ for the high-risk version. – dezdichado Feb 08 '21 at 00:51