Where does this variation of the importance sampling weight come from?

Asked Jan 21 '21 at 13:40

Active Jan 29 '21 at 16:24

Viewed 94 times

I have seeing a variation in importance sampling (IS) in Prioritized Experience Replay (PER) in some implementations regarding the original paper approach stated as (in section 3.4):

$$ w_{i}=\left(\frac{1}{N} \cdot \frac{1}{P(i)}\right)^{\beta} $$

For something like this:

$$ w_{i}=\left(\frac{\min (P(i))}{P(i)}\right)^{\beta} $$

Does anyone know where it comes from? A reference that explains the reason for that new formula and improvements obtained?

My intuition guides me to some conclusions, not necessarily correct, using this new formula:

In the beginning, supposing that the PER stills have empty positions, $\min(P(i)) \sim 0$, not giving too much weight for samples. But it grows substantially once the capacity is achieved as well as when the error becomes low (plus the incrementing Beta)

A code on github that applies this: link

edited Jan 29 '21 at 16:24

asked Jan 21 '21 at 13:40

HenDoNR

Where does this variation of the importance sampling weight come from?

0 Answers0