Being confused of distribution notations in Deep Learning book

Question

In chapter 5 of Deep Learning book of Ian Goodfellow, some notations in the loss function as below make me really confused.

I tried to understand $x,y \sim p_{data}$ means a sample $(x, y)$ sampled from original dataset distribution (or $y$ is the ground truth label). The loss function in formula 5.101 seems to be correct for my understanding. Actually, the formula 5.101 is derived from 5.100 by adding the regularization.

Therefore, the notation $x,y \sim \hat{p}_{data}$ in formula 5.96 and 5.100 is really confusing to me whether the loss function is defined correctly (kinda typo error or not). If not so, could you help me to refactor the meaning of two notations, are they similar and correct?

Many thanks for your help.

nbro · Accepted Answer · 2019-05-25T13:00:33.477

At page 130 of the same book, the author states that $\hat{p}_\text{data}$ is an empirical distribution defined by the training data. Similarly, at page 129, he states that $p_\text{data}$ is the true distribution that generates the set $\mathbb{X} = \{ \boldsymbol{x}^{(1)}, \dots, \boldsymbol{x}^{(m)} \}$.

What is the difference between $\hat{p}_\text{data}$ and $p_\text{data}$? You can think of $\hat{p}_\text{data}$ as a histogram that is calculated from the set $\mathbb{X}$ and $p_\text{data}$ as the true distribution from which the elements in $\mathbb{X}$ are drawn.

The subscript ${\boldsymbol{x}, y \sim \hat{p}_\text{data}}$ in the expectation $\mathbb{E}_{\boldsymbol{x}, y \sim \hat{p}_\text{data}}$ indicates that the expectation is taken with respect to the samples drawn from the empirical distribution $\hat{p}_\text{data}$. In other words, you will optimise the objective function $J$ using the training data. Have a look at this question for more info.

The subscript ${\boldsymbol{x}, y \sim p_\text{data}}$ in the expectation of formula $5.101$ is a typo. In fact, in this online version of the book, at page 151, the subscript of the expectation is ${\boldsymbol{x}, y \sim \hat{p}_\text{data}}$.

Thank you a lot. your answer go just straight to my confusion :D — David Ng, May 25 '19 at 14:14

Being confused of distribution notations in Deep Learning book

1 Answers1