4

I am having a hard time converting line 6 of the prioritized experience replay algorithm from the original paper into plain English (see below): algorithm

I understand that new transitions (not visited before) are given maximal priority. On line 6 this would be done for every transition in an initial pass since the history is initialized as empty on line 2.

I’m having trouble with the notation $p_t = \text{max}_{i<t} p_i$. Can someone please state this in plain English? If $t$ = 4 for example, then $p_t$ = 4? How is this equal to max$_{i<t} p_i$.

It seems in my contrived example here, max$_{i<t} p_i$ would be 3. I must be misreading this notation.

nbro
  • 39,006
  • 12
  • 98
  • 176
Hanzy
  • 499
  • 3
  • 10

1 Answers1

3

From my interpretation what it means is that $p_t$ is the priority value associated with each transition and $p_t = max_{i<t} p_i $ means that the priority of transition number $t$ will be the maximum between the values of the priorities of the previous elements.

Example: since $p_1$ is initialized to $1$, all the new experiences will be too: \begin{equation} p_2 = max\{p_1\} = 1, \end{equation}

\begin{equation} p_3 = max\{p_1,p_2\} = 1, \end{equation}

\begin{equation} p_4 = max\{p_1,p_2,p_3\} = 1. \end{equation}

Miguel Saraiva
  • 767
  • 1
  • 5
  • 14