2

Reading through the CS229 lecture notes on generalised linear models, I came across the idea that a linear regression problem can be modelled as a Gaussian distribution, which is a form of the exponential family. The notes state that $h_{\theta}(x)$ is equal to $E[y | x; \theta]$. However, how can $h_{\theta}(x)$ be equal to the expectation of $y$ given input $x$ and $\theta$, since the expectation would require a sort of an averaging to take place?

Given x, our goal is to predict the expected value of $T(y)$ given $x$. In most of our examples, we will have $T(y) = y$, so this means we would like the prediction $h(x)$ output by our learned hypothesis h to satisfy $h(x) = E[y|x]$.

To show that ordinary least squares is a special case of the GLM family of models, consider the setting where the target variable y (also called the response variable in GLM terminology) is continuous, and we model the conditional distribution of y given x as a Gaussian $N(\mu,\sigma^2)$. (Here, $\mu$ may depend $x$.) So, we let the ExponentialFamily($\eta$) distribution above be the Gaussian distribution. As we saw previously, in the formulation of the Gaussian as an exponential family distribution, we had μ = η. So, we have $$h_{\theta}(x) = E[y|x; \theta] = \mu = \eta = \theta^Tx.$$

EDIT

Upon reading other sources, $y_i \sim N(\mu_i, \sigma^2)$ meaning that each individual output has it's own normal distribution with mean $\mu_i$ and $h_{\theta}(x_i)$ is set as the mean of the normal distribution for $y_i$. In that case, then the hypothesis makes sense to be assigned the expectation.

calveeen
  • 1,251
  • 7
  • 17

1 Answers1

1

In generalised Linear models, each output variable $y_i$ is modelled as a distribution from the exponential family, with the hypothesis function $h_{\theta}(x)$ for a given $\theta$ as the expected value of $y_i$ and maximum likelihood estimation is usually the method used to solve GLM's.

calveeen
  • 1,251
  • 7
  • 17