4

I have seeing a variation in importance sampling (IS) in Prioritized Experience Replay (PER) in some implementations regarding the original paper approach stated as (in section 3.4):

$$ w_{i}=\left(\frac{1}{N} \cdot \frac{1}{P(i)}\right)^{\beta} $$

For something like this:

$$ w_{i}=\left(\frac{\min (P(i))}{P(i)}\right)^{\beta} $$

Does anyone know where it comes from? A reference that explains the reason for that new formula and improvements obtained?

My intuition guides me to some conclusions, not necessarily correct, using this new formula:

  • In the beginning, supposing that the PER stills have empty positions, $\min(P(i)) \sim 0$, not giving too much weight for samples. But it grows substantially once the capacity is achieved as well as when the error becomes low (plus the incrementing Beta)

A code on github that applies this: link

HenDoNR
  • 81
  • 4

0 Answers0