4

I've been looking online for a while for a source that explains these computations but I can't find anywhere what does the $|A(s)|$ mean. I guess $A$ is the action set but I'm not sure about that notation:

$$\frac{\varepsilon}{|\mathcal{A}(s)|} \sum_{a} Q^{\pi}(s, a)+(1-\varepsilon) \max _{a} Q^{\pi}(s, a)$$

Here is the source of the formula.

I also want to clarify that I understand the idea behind the $\epsilon$-greedy approach and the motivation behind the on-policy methods. I just had a problem understanding this notation (and also some other minor things). The author there omitted some stuff, so I feel like there was a continuity jump, which is why I didn't get the notation, etc. I'd be more than glad if I can be pointed towards a better source where this is detailed.

nbro
  • 39,006
  • 12
  • 98
  • 176
Metrician
  • 95
  • 5
  • 2
    From the pseudo code, it is pretty clear that $A(s)$ refers to the set of all possible actions, since in step *c)* the algorithm iterates through all actions ($a$) (taken from that set). That it is about the actions becomes apparent from the use of $a$. – Daniel B. Jul 14 '20 at 20:34
  • Yes I realized that I was talking more about the notation $|A(s)|$ but I get it now. Thanks. – Metrician Jul 14 '20 at 20:44

1 Answers1

6

This expression: $|\mathcal{A}(s)|$ means

  • $|\quad|$ the size of

  • $\mathcal{A}(s)$ the set of actions in state $s$

or more simply the number of actions allowed in the state.

This makes sense in the given formula because $\frac{\epsilon}{|\mathcal{A}(s)|}$ is then the probability of taking each exploratory action in an $\epsilon$-greedy policy. The overall expression is the expected return when following that policy, summing expected results from the exploratory and greedy action.

Neil Slater
  • 28,678
  • 3
  • 38
  • 60
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/118285/discussion-on-answer-by-neil-slater-what-does-the-term-mathcalas-mean-i). – nbro Jan 10 '21 at 15:56