1

I am taking ML course and I am confused about some derivations of math

enter image description here enter image description here

Could you explain the two steps I marked on the slides? For the first step, I thought $P(beta|X,y) = \frac{P(X,y|beta)P(beta)}{P(X,y)}$ but I don't know the further steps to reach the next step. Maybe I am confused of the conditional notation and semicolon notation.

tesio
  • 185
  • 4

1 Answers1

0

For step 1: you're on the right track, but the derivation is easier if we leave everything conditional on $X$ $$p(\beta| X,y) = \frac{p(y| X, \beta) p(\beta | X)}{p(y |X)}$$ Now, the two key steps. First, $p(\beta | X) = p(\beta)$ if $X$ and $\beta$ are independent (why is this the case?). Second, $p(y |X)$ is constant in $\beta$. How does multiplying a function by a constant affect the location of its maximum?

For step 2: simply substitute the expressions given for $p(y|X, \beta)$ and $p(\beta)$. There will be additional terms that don't depend on $\beta$. How does adding a constant to a function affect the location of its maximum?

Luke
  • 56
  • 2
  • How do we get the first equality of leaving everything conditional on $X$? I don't understand what leaving everything conditional on $X$ means. Also, how do we know that $X$ and $\beta$ are independent? – tesio Mar 09 '23 at 00:55
  • 1
    The idea is that we can apply Bayes rule to conditional distributions that are all conditioned on the same thing. In other words, if we ignored the $X$ conditioning in all terms of the above equation, we would get Bayes rule. It's also valid with the conditioning there. You can prove this by fiddling around with the definitions of conditional distributions. In terms of independence, should we expect the slope of a linear function to depend on the points that we probe it? – Luke Mar 09 '23 at 01:09
  • I understand the conditional part. For the independence, do you mean that the slope doesn't change if the X changes? Then, the slope depends on $y$ but not $X$? – tesio Mar 09 '23 at 01:31
  • 1
    Yes indeed. For linear regression, the model that we are fitting is: $X \rightarrow y \leftarrow \beta$ So $X$ and $\beta$ are marginally independent. Note that in standard linear regression, $X$ is not even a random variable, but rather is assumed to be known exactly – Luke Mar 09 '23 at 17:30
  • 1
    Thank you so much! – tesio Mar 10 '23 at 02:01