What does the parameter $y$ stand for in function $g(y,\mu,\sigma)$ related to REINFORCE algorithm?

Question

I am wondering what the parameter $y$ in the function $g(y,\mu,\sigma)=\frac{1}{(2\pi)^{1/2}\sigma}e^{-(y-\mu)^{2/2\sigma^2}}$ stands for in Section 6 (page 14) of the paper introducing the REINFORCE family of algorithms.

Drawing an analogy to Equation 4 of the same paper, I would guess that it refers to the outcome (i.e. sample) of sampling from a probability distribution parameterized by the parameters $\mu$ and $\sigma$. However, I am not sure whether that is correct or not.

As far as I can tell, it is the point at which you would evaluate the density function of the random variable (in this case the normal distn). In a policy gradient setting I imagine that $y$ here is the action. Does this sound correct? — David, Jan 10 '21 at 10:13

nbro · Accepted Answer · 2021-01-10T14:59:34.373

2

If you take a look at the Wikipedia page related to the normal distribution, you will see the definition of the Gaussian density

$$ {\displaystyle f(x)={\frac {1}{\sigma {\sqrt {2\pi }}}}e^{-{\frac {1}{2}}\left({\frac {x-\mu }{\sigma }}\right)^{2}}} \label{1}\tag{1} $$

and you will see that the $y$ in your formula corresponds to the $x$ in equation \ref{1}.

I've seen this notation in the context of computer vision and image processing, where the Gaussian kernel is used to blur images.

So, as pointed out by someone in a comment, $y$ should indeed be the point where you evaluate the density.

Maybe the confusing part is that all parameters are treated equally in terms of their purpose, while $\mu$ and $\sigma$ are clearly the parameters that define the specific density, so they are not the inputs to the specific density.

After having read the relevant section of the paper, I now understand why you're confused. The author refers to $y$ as the output (not yet sure why: maybe it's the output of another unit that feeds this Gaussian unit?), but I think that this explanation still applies. The output of the Gaussian density $g$ is not $y$, but the density that corresponds to $y$. In fact, in appendix $B$ of the paper, the author says that $Y$ is the support of $g$ and $y$ is an element of $Y$.

edited Jan 10 '21 at 14:59

answered Jan 10 '21 at 13:47

nbro

39,006
12
98
176

So, in other words, $f(x)$ simply returns the probability of input value $x$ with respect to a Gaussian distribution parameterized by $\mu$ and $\sigma$. Is that correct? – Daniel B. Jan 11 '21 at 01:45
$f(x)$ is not a _probability_ but a [_density_ value](https://en.wikipedia.org/wiki/Probability_density_function), i.e. it's not necessarily a number in the range $[0, 1]$. Remember that, for continuous r.v.s., the probability that it takes a specific value is _zero_. So, with continuous r.v.s, you typically want to calculate the probability that the r.v. lies in some range. Note also that when people use the term "probability distribution", they may refer to the pdf, if the r.v. is continuous. Maybe check [this answer](https://ai.stackexchange.com/a/16842/2444). – nbro Jan 11 '21 at 01:55
1

Ok, thanks a lot! The referred answer helps quite a lot as well! – Daniel B. Jan 11 '21 at 02:27
1

A crucial bit I just overlooked from the paper (page 14) was "density function $g$ determining the output $y$". So, $y$ is indeed a sample from a Gaussian parameterized by $\mu$ and $\sigma$. I *suspect* that the term $y$ is only shown in that definition of the density function $g$, since $g$'s partial derivative (in this way including $y$ as well) is used to compute the gradient of the policy network's output nodes having produced the parameterization which eventually led to the sampling of $y$ (where those nodes shall be updated also considering the "quality" of the drawn sample $y$). – Daniel B. Jan 13 '21 at 02:13
1

@DanielB. Yes, I think you're right, $y$ can be a sample from the Gaussian distribution (so, in a way, an "output" from the Gaussian distribution), but it can also denote an input to the Gaussian density. So, the samples from the Gaussian distribution should indeed be values that you can evaluate the Gaussian density at. For example, let's say that you have a Gaussian with $\mu = 0$ and $\sigma = 1$. Then you sample $y=0$ (i.e. a number from the Gaussian, which, in this case, happens to be the mean), then you can evaluate the density of $0$ (it would be the height of this Gaussian at zero). – nbro Jan 13 '21 at 08:26
1

He's probably using the letter $y$ rather than $x$ to emphasize that this is a sample (output) from a Gaussian distribution rather than the value you want to evaluate the Gaussian density at, given that $y$, in the context of neural networks and stuff, is usually used to denote an output. – nbro Jan 13 '21 at 08:45
1

I think important is also to consider that the variable $x$ had already been reserved to denote inputs to NN layers constituting the $\mu$ and $\sigma$ generating NN. So, I think the author really intends to adhere to the standard Machine Learning way of referring to inputs and outputs as $x$ and $y$ respectively. And after all that's also pretty accurate; except for the fact that the NN (i.e. the policy network) is not a vanilla NN, but simply one whose regular output layer is appended by an additional layer of random generators generating outputs given the outputs of the previous layer. – Daniel B. Jan 13 '21 at 14:20

score 0 · Answer 2 · answered Jan 10 '21 at 11:06

0

It states that "To simplify notation, we focus on one single unit and omit the usual unit index subscript throughout"

So they are simply removing the i-th index from the equation for simplicity. So g is a function of a given instance "y" and the parameters μ and σ.

answered Jan 10 '21 at 11:06

John Rothman

118
5

But what, in particular, does the 'given instance "y"' refer to? – Daniel B. Jan 10 '21 at 11:17
"y" is just some measurement or data instance. So if you have a dataset of temperature measurements in the day, "y" could be one single temperature measurement in your dataset, e.g. 20 – John Rothman Jan 10 '21 at 11:38

What does the parameter $y$ stand for in function $g(y,\mu,\sigma)$ related to REINFORCE algorithm?

2 Answers2