$\frac{P(x_1 \mid y, s = 1) \dots P(x_n \mid y, s = 1) P(y \mid s = 1)}{P(x \mid s = 1)}$ indicates that naive Bayes learners are global learners?

Question

I am currently studying the paper Learning and Evaluating Classifiers under Sample Selection Bias by Bianca Zadrozny. In section 3. Learning under sample selection bias, the author says the following:

We can separate classifier learners into two categories:

local: the output of the learner depends asymptotically only on $P(y \mid x)$

global: the output of the learner depends asymptotically both on $P(x)$ and on $P(y \mid x)$.

The term "asymptotically" refers to the behavior of the learner as the number of training examples grows. The names "local" and "global" were chosen because $P(x)$ is a global distribution over the entire input space, while $P(y \mid x)$ refers to many local distributions, one for each value of $x$. Local learners are not affected by sample selection bias because, by definition $P(y \mid x, s = 1) = P(y \mid x)$ while global learners are affected because the bias changes $P(x)$.

Then, in section 3.1.1. Naive Bayes, the author says the following:

In practical Bayesian learning, we often make the assumption that the features are independent given the label $y$, that is, we assume that $$P(x_1, x_2, \dots, x_n \mid y) = P(x_1 \mid y) P(x_2 \mid y) \dots P(x_n \mid y).$$ This is the so-called naive Bayes assumption. With naive Bayes, unfortunately, the estimates of $P(y \mid x)$ obtained from the biased sample are incorrect. The posterior probability $P(y \mid x)$ is estimated as $$\dfrac{P(x_1 \mid y, s = 1) \dots P(x_n \mid y, s = 1) P(y \mid s = 1)}{P(x \mid s = 1)} ,$$ which is different (even asymptotically) from the estimate of $P(y \mid x)$ obtained with naive Bayes without sample selection bias. We cannot simplify this further because there are no independence relationships between each $x_i$, $y$, and $s$. Therefore, naive Bayes learners are global learners.

Since it is said that, for global learners, the output of the learner depends asymptotically both on $P(x)$ and on $P(y \mid x)$, what is it about $\dfrac{P(x_1 \mid y, s = 1) \dots P(x_n \mid y, s = 1) P(y \mid s = 1)}{P(x \mid s = 1)}$ that indicates that naive Bayes learners are global learners?

EDIT: To be clear, if we take the example given for the local learner case (section 3.1. Bayesian classifiers), then it is evident:

Bayesian classifiers compute posterior probabilities $P(y \mid x)$ using Bayes' rule: $$P(y \mid x) = \dfrac{P(x \mid y)P(y)}{P(x)}$$ where $P(x \mid y)$, $P(y)$ and $P(x)$ are estimated from the training data. An example $x$ is classified by choosing the label $y$ with the highest posterior $P(y \mid x)$.

We can easily show that bayesian classifiers are not affected by sample selection bias. By using the biased sample as training data, we are effectively estimating $P(x \mid y, s = 1)$, $P(x \mid s = 1)$ and $P(y \mid s = 1)$ instead of estimating $P(x \mid y)$, $P(y)$ and $P(x)$. However, when we substitute these estimates into the equation above and apply Bayes' rule again, we see that we still obtain the desired posterior probability $P(y \mid x)$: $$\dfrac{P(x \mid y, s = 1) P(y \mid s = 1)}{P(x \mid s = 1)} = P(y \mid x, s = 1) = P(y \mid x)$$ since we are assuming that $y$ and $s$ are independent given $x$. Note that even though the estimates of $P(x \mid y, s = 1)$, $P(x \mid s = 1)$ and $P(y \mid s = 1)$ are different from the estimates of $P(x \mid y)$, $P(x)$ and $P(y)$, the differences cancel out. Therefore, bayesian learners are local learners.

Note that we get $P(y \mid x)$. However, in the global case, it is not clear how we get $P(x)$ and $P(y \mid x)$ (as is required for global leaners) from $\dfrac{P(x_1 \mid y, s = 1) \dots P(x_n \mid y, s = 1) P(y \mid s = 1)}{P(x \mid s = 1)}$.

$\frac{P(x_1 \mid y, s = 1) \dots P(x_n \mid y, s = 1) P(y \mid s = 1)}{P(x \mid s = 1)}$ indicates that naive Bayes learners are global learners?

0 Answers0