I have seeing a variation in importance sampling (IS) in Prioritized Experience Replay (PER) in some implementations regarding the original paper approach stated as (in section 3.4):
$$ w_{i}=\left(\frac{1}{N} \cdot \frac{1}{P(i)}\right)^{\beta} $$
For something like this:
$$ w_{i}=\left(\frac{\min (P(i))}{P(i)}\right)^{\beta} $$
Does anyone know where it comes from? A reference that explains the reason for that new formula and improvements obtained?
My intuition guides me to some conclusions, not necessarily correct, using this new formula:
- In the beginning, supposing that the PER stills have empty positions, $\min(P(i)) \sim 0$, not giving too much weight for samples. But it grows substantially once the capacity is achieved as well as when the error becomes low (plus the incrementing Beta)
A code on github that applies this: link