0

Let $S$ be a finite subset of a $\mathbb{R}^k$ partitioned into $N$ subsets $S_1, \ldots, S_N$ and let $n_j = |S_j|$. The between-groups sum of squares of the partition is defined as $$bSS(S_1,\ldots, S_N) = \sum_{j=1}^{N} n_j ||\mathbb{E}[S_j] - \mathbb{E}[S]||^2 $$ The within-group sum of squares of the each $S_j$ is defined as $$SS(S_j) = \sum_{u \in S_j} ||u - \mathbb{E}[S_j]||^2$$ The sum of squares of the set $S$ is defined as $$SS(S) = \sum_{v \in S} ||v - \mathbb{E}[S]||^2$$

It is, I think, well-known that $$SS(S) = bSS(S_1,\ldots, S_n) + \sum_{j=1}^{N} SS(S_j) $$

Both Linear Discriminant Analysis (LDA) and Partial Least Squares Discriminant Analysis (PLSDA) share a common initial part that aims at finiding a one-dimensional subspace $U$ of $\mathbb{R}^k$ such that the projection map $\pi_U$ of $\mathbb{R}^k$ onto $U$ maximizes the quantity

$$\frac{bSS(\pi_U(S_1),\ldots, \pi_U(S_n))}{\sum_{j=1}^{N} SS(\pi_U(S_j))}$$

The single vector $u$ that generates $U$ is understood as the direction of best separation for the sets $S_1,\ldots, S_n$ and represents a latent variable resulting from a linear combination of the original features (that are associated to the canonical basis of $\mathbb{R}^k$).

Suppose $S_1,\ldots, S_n$ are not linearly separable in $\mathbb{R}^k$ but there is a real vector space $W$ and a non-linear function $f: \mathbb{R}^k \to W$ (such as that computed by the hidden layers of a deep neural network) such that the images $f(S_1),\ldots, f(S_n)$ are linearly separable in $W$.

Suppose one performs LDA or PLSDA in $W$ and finds a one-dimensional subspace $U \subseteq W$ that represents the most important direction for the separation of $f(S_1),\ldots, f(S_n)$. The anti-image $f^{-1}(U)$ is not in general a subspace of $V$. Are there situations in which $f^{-1}(U)$ can be used to understand directions of best separation inside the original input space $\mathbb{R}^k$ ?

Alberto
  • 101

0 Answers0