3

The KL divergence is defined as

$$D_{KL}=\sum_i p(x_i)log\left(\frac{p(x_i)}{q(x_i)}\right)$$

Why does $D_{KL}$ not satisfy the triangle inequality?

Also, can't you make it satisfy the triangle inequality by taking the absolute value of the information at every point?

nbro
  • 39,006
  • 12
  • 98
  • 176
user8714896
  • 717
  • 1
  • 4
  • 21

1 Answers1

3

To prove that the KL divergence does not satisfy the triangle inequality, you just need a counterexample.

Definitions

KL divergence

Let's first recapitulate the definition of KL divergence for discrete probability distributions $p$ and $q$ (for simplicity).

$$ D_{\text{KL}}(p\parallel q) = \sum_{x\in {\mathcal {X}}} p(x)\log \left( \frac {p(x)}{q(x)} \right) $$

Triangle inequality

Let's also recap the definition of the triangle inequality for distance $d$, which can be expressed as

$$ d(p,q) \leq d(p, r) + d(r, q), $$ where $p, q$ and $r$ are probability distributions.

Proof

For simplicity, let the sample space be $\mathcal{X} = \{0, 1 \}$. Consider now the following three discrete probability distributions.

$$ p(x)= \begin{cases} \frac{1}{2}& \text{if } x = 0\\ \frac{1}{2}& \text{if } x = 1 \end{cases} $$

$$ r(x)= \begin{cases} \frac{1}{4}& \text{if } x = 0\\ \frac{3}{4}& \text{if } x = 1 \end{cases} $$

$$ q(x)= \begin{cases} \frac{1}{10}& \text{if } x = 0\\ \frac{9}{10}& \text{if } x = 1 \end{cases} $$

Let's now apply the definition of the triangle inequality and the KL divergence for these discrete probability distributions. So, we want to show the following inequality does not hold.

$$ D_{\text{KL}}(p\parallel q) \leq D_{\text{KL}}(p\parallel r) + D_{\text{KL}}(r\parallel q) $$

which can be expanded to

\begin{align} \sum_{x\in \{0, 1 \}} p(x)\log \left( \frac {p(x)}{q(x)} \right) &\leq \sum_{x\in \{0, 1 \} } p(x)\log \left( \frac {p(x)}{r(x)} \right) + \sum_{x\in \{0, 1 \}} r(x)\log \left( \frac {r(x)}{q(x)} \right) \end{align}

which roughly corresponds to

\begin{align} 0.51 &\leq 0.14 + 0.09 \\ 0.51 &\leq 0.24 \end{align} which is clearly false.

Given that the triangle inequality does not hold in one case, it doesn't hold in all cases, so the triangle inequality does not hold for the KL divergence.

$$\tag*{$\blacksquare$}$$

For reproducibility, I have used the following Python (3.7) program to compute the KL divergences.

from math import log

one = (1 / 2 * log((1 / 2) / (1 / 10))) + (1 / 2 * log((1 / 2) / (9 / 10)))
two = (1 / 2 * log((1 / 2) / (1 / 4))) + (1 / 2 * log((1 / 2) / (3 / 4)))
three = (1 / 4 * log((1 / 4) / (1 / 10))) + (3 / 4 * log((3 / 4) / (9 / 10)))

print(one)
print(two)
print(three)
print(two + three)
print(one <= two + three)
nbro
  • 39,006
  • 12
  • 98
  • 176