3

According to DeepMinds's paper Prioritized Experience Replay (2016), specifically Appendix B.2.1 "Proportional prioritization" (p. 13), one should equally divide the priority range $[0, p_\text{total}]$ into $k$ ranges, where $k$ is the size of the batch, and sample a random variable within these sub-ranges. This random variable is then used to sample an experience from the sum-tree according to its priority (probability).

Why do we need to do that? Why not simply sampling $k$ random variables in $[0, p_\text{total}]$ and getting $k$ variables from the sum-tree without dividing the priority range into $k$ different ranges? Isn't this the same?

nbro
  • 39,006
  • 12
  • 98
  • 176
Firas_
  • 31
  • 2

0 Answers0