Highest Voted 'kl-divergence' Questions - Artificial Intelligence Stack Exchange

19

votes

1 answer

Why has the cross-entropy become the classification standard loss function and not Kullback-Leibler divergence?

The cross-entropy is identical to the KL divergence plus the entropy of the target distribution. The KL divergence equals zero when the two distributions are the same, which seems more intuitive to me than the entropy of the target distribution,…

asked Mar 30 '17 at 08:39

Josh Albert

301
2
6

7

votes

2 answers

How is this Pytorch expression equivalent to the KL divergence?

I found the following PyTorch code (from this link) -0.5 * torch.sum(1 + sigma - mu.pow(2) - sigma.exp()) where mu is the mean parameter that comes out of the model and sigma is the sigma parameter out of the encoder. This expression is apparently…

pytorch proofs implementation variational-autoencoder kl-divergence

asked Feb 13 '21 at 09:53

user8714896

717
1
4
21

7

votes

2 answers

Why is KL divergence used so often in Machine Learning?

The KL Divergence is quite easy to compute in closed form for simple distributions -such as Gaussians- but has some not-very-nice properties. For example, it is not symmetrical (thus it is not a metric) and it does not respect the triangular…

probability-distribution kl-divergence wasserstein-metric total-variational-distance

asked Dec 15 '20 at 14:20

Federico Taschin

233
1
6

6

votes

1 answer

Why is the evidence equal to the KL divergence plus the loss?

Why is the equation $$\log p_{\theta}(x^1,...,x^N)=D_{KL}(q_{\theta}(z|x^i)||p_{\phi}(z|x^i))+\mathbb{L}(\phi,\theta;x^i)$$ true, where $x^i$ are data points and $z$ are latent variables? I was reading the original variation autoencoder paper and I…

objective-functions variational-autoencoder latent-variable kl-divergence evidence-lower-bound

asked Feb 07 '20 at 07:49

user8714896

717
1
4
21

5

votes

1 answer

Why is the Jensen-Shannon divergence preferred over the KL divergence in measuring the performance of a generative network?

I have read articles on how Jensen-Shannon divergence is preferred over Kullback-Leibler in measuring how good a distribution mapping is learned in a generative network because of the fact that JS-divergence better measures distribution similarity…

objective-functions generative-adversarial-networks probability-distribution kl-divergence jensen-shannon-divergence

asked Nov 11 '19 at 16:01

ashenoy

1,409
4
18

5

votes

2 answers

What are the advantages of the Kullback-Leibler over the MSE/RMSE?

I've recently encountered different articles that are recommending to use the KL divergence instead of the MSE/RMSE (as the loss function), when trying to learn a probability distribution, but none of the articles are giving a clear reasoning why…

comparison objective-functions optimization mean-squared-error kl-divergence

asked Sep 11 '19 at 06:49

razvanc92

1,108
7
18

4

votes

1 answer

What is the impact of scaling the KL divergence and reconstruction loss in the VAE objective function?

Variational autoencoders have two components in their loss function. The first component is the reconstruction loss, which for image data, is the pixel-wise difference between the input image and output image. The second component is the…

objective-functions gradient-descent variational-autoencoder kl-divergence

asked Nov 05 '20 at 07:56

rich

151
6

3

votes

1 answer

How do you calculate KL divergence on a three-dimensional space for a Variational Autoencoder?

I'm trying to implement a variational auto-encoder (as seen in Section 3.1 here: https://arxiv.org/pdf/2004.06271.pdf). It differs from a traditional VAE because it encodes its input images to three-dimensional latent feature maps. In other words,…

convolutional-neural-networks computer-vision math variational-autoencoder kl-divergence

asked Feb 09 '21 at 18:39

magmacollaris

35
4

3

votes

1 answer

Are there some notions of distance between two policies?

I want to determine some distance between two policies $\pi_1 (a \mid s)$ and $\pi_2 (a \mid s)$, i.e. something like $\vert \vert \pi_1 (a \mid s) - \pi_2(a \mid s) \vert \vert$, where $\pi_i (a\mid s)$ is the vector $(\pi_i (a_1 \mid s), \dots,…

reinforcement-learning reference-request policies kl-divergence wasserstein-metric

asked Dec 13 '20 at 16:20

Felix P.

287
1
6

3

votes

2 answers

When should one prefer using Total Variational Divergence over KL divergence in RL

In RL, both the KL divergence (DKL) and Total variational divergence (DTV) are used to measure the distance between two policies. I'm most familiar with using DKL as an early stopping metric during policy updates to ensure the new policy doesn't…

reinforcement-learning comparison probability-distribution kl-divergence total-variational-distance

asked Oct 07 '20 at 17:03

mugoh

531
4
20

3

votes

1 answer

What is the reason for mode collapse in GAN as opposed to WGAN?

In this article I am reading: $D_{KL}$ gives us inifity when two distributions are disjoint. The value of $D_{JS}$ has sudden jump, not differentiable at $\theta=0$. Only Wasserstein metric provides a smooth measure, which is super helpful for a…

neural-networks generative-adversarial-networks kl-divergence wasserstein-metric wasserstein-gan

asked Jul 25 '20 at 17:19

craft

131
1

3

votes

1 answer

Why does the KL divergence not satisfy the triangle inequality?

The KL divergence is defined as $$D_{KL}=\sum_i p(x_i)log\left(\frac{p(x_i)}{q(x_i)}\right)$$ Why does $D_{KL}$ not satisfy the triangle inequality? Also, can't you make it satisfy the triangle inequality by taking the absolute value of the…

proofs variational-autoencoder kl-divergence

asked Feb 14 '20 at 08:23

user8714896

717
1
4
21

2

votes

1 answer

How is this statement from a TensorFlow implementation of a certain KL-divergence formula related to the corresponding formula?

I am trying to understand a certain KL-divergence formula (which can be found on page 6 of the paper Evidential Deep Learning to Quantify Classification Uncertainty) and found a TensorFlow implementation for it. I understand most parts of the…

deep-learning tensorflow papers implementation kl-divergence

asked Aug 07 '21 at 19:13

Baka

33
3

2

votes

1 answer

How does the Kullback-Leibler divergence give "knowledge gained"?

I'm reading about the KL divergence on Wikipedia. I don't understand how the equation gives "information gained" as it says in the "Interpretations" section Expressed in the language of Bayesian inference, ${\displaystyle D_{\text{KL}}(P\parallel…

machine-learning terminology variational-autoencoder information-theory kl-divergence

asked Oct 22 '19 at 04:32

Gooby

351
2
10

1

vote

0 answers

How to compare different trajecories in a Markov Decision Process

I realize that my question is a bit fuzzy and I am sorry for that. If needed, I will try to make it more rigorous and precice. Let $\mathcal{M}$ be a Markov Decision Process, with state space $\mathcal{S}$ and action space $\mathcal{A}$. Let $\tau =…

reinforcement-learning markov-decision-process kl-divergence

asked Aug 25 '22 at 15:27

Onil90

173
5

Questions tagged [kl-divergence]