I am having a hard time converting line 6 of the prioritized experience replay algorithm from the original paper into plain English (see below):
I understand that new transitions (not visited before) are given maximal priority. On line 6 this would be done for every transition in an initial pass since the history is initialized as empty on line 2.
I’m having trouble with the notation $p_t = \text{max}_{i<t} p_i$. Can someone please state this in plain English? If $t$ = 4 for example, then $p_t$ = 4? How is this equal to max$_{i<t} p_i$.
It seems in my contrived example here, max$_{i<t} p_i$ would be 3. I must be misreading this notation.