0

I am learning the expectation-maximization algorithm from the article Semi-Supervised Text Classification Using EM. The algorithm is very interesting. However, the algorithm looks like doing a circular inference here.

enter image description here

I don't know am I understanding the description right or wrong, what I perceived is:

Step 1: train NB classifier on labeled data.
Repeat
Step 2 (E-step): use trained NB to add label to unlabeled data.
Step 3 (M-step): train NB classifier by using labeled data and unlabeled data(with tags from step 2) to get a new classifier
Until convergent.

Here is the question:

In step 2, the label is tagged by the classifier trained on the labeled data, which is the only source containing the knowledge on a correct prediction. And the step-3 (M-step) is actually updating the classifier on the labels generated from step 2. The whole process is relying on the labeled data, so how can the EM classifier improve the classification? Can someone explain it to me?

nbro
  • 39,006
  • 12
  • 98
  • 176
  • Are you essentially asking why EM actually converges to a good solution (i.e. good $\hat{\theta}$)? – nbro Jul 08 '20 at 13:57
  • Yea, sorry for my bad description. – Cheleeger Ken Jul 08 '20 at 14:04
  • If you search for the word "convergence" in the same paper from which you took the screenshot, you will find some information about the convergence of the EM algorithm. – nbro Jul 08 '20 at 21:17

0 Answers0