1

I was reading this article https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf and in it there is an algorithm of deep q learning with experience replay as follows:

enter image description here

On line 12, when the algorithm is setting the values for y_j, the second line says:

enter image description here

I'm confused as to what a' refers to and where it comes from.

(Edit) Why on this line (line 7) it's a:

enter image description here

But on line 12 it's a' ?

Can someone please explain it to me?

nbro
  • 39,006
  • 12
  • 98
  • 176
Ness
  • 216
  • 1
  • 8

1 Answers1

2

$r_j + \gamma \max_{a'}Q(\phi_{j+1},a';\theta)$
I'm confused as to what $a'$ refers to and where it comes from.

Here $a'$ is a "dummy" argument over which you perform the maximization operation $\max_{a'}$.

In practice, that would correspond to axis (or dim) argument in numpy/pytorch/tensorflow

$a_t = \max_a Q^*(\phi(s_t),a;\theta)$
Why on the line 7 it's $a$

I'd say that in this case it is a sloppy math notation (or just typo) on the authors' part.
It should be argmax, not max. $$a_t = \arg \max_a Q^*(\phi(s_t),a;\theta)$$

Kostya
  • 2,416
  • 7
  • 23
  • So, it doesn't refer to an action a'? – Ness Jan 09 '23 at 16:48
  • Could you check my edit please? – Ness Jan 09 '23 at 17:07
  • 2
    @Ness: It is both a dummy argument, and refers to a *potential* action taken as $a_{t+1}$, but not necessarily the *actual* observed action taken during training. For Q-learning, you don't look at the next real action choice, but optimise based on the current maximising action choice, whilst other methods may want to use expected action on the behaviour policy or real observed action that the agent takes next – Neil Slater Jan 09 '23 at 17:13