Applications of Information Theory in Machine Learning

Question

How is information theory applied to machine learning, and in particular to deep learning, in practice? I'm more interested in concepts that yielded concrete innovations in ML, rather than theoretical constructions.

Note that, I'm aware that basic concepts such as entropy is used for training decision trees, and so on. I'm looking for applications which use slightly more advanced concepts from information theory, whatever they are.

nbro · Accepted Answer · 2020-11-23T01:57:17.827

Apart from the entropy and the cross-entropy, which are widely used in deep learning and you seem to be aware of, there is also the Kullback-Leibler divergence (also known as relative entropy), which is widely used in the context of variational Bayesian neural networks and variational auto-encoders, given that it's often part of the loss function that is minimized, i.e. the Evidence Lower BOund, which is a proxy objective function for the KL divergence between the prior and posterior distributions (which actually corresponds to the minimum description length needed to encode the data: huh?). See this answer for more details. There is also the mutual information, which has also been used as a measure of uncertainty in the context of Bayesian neural networks.

Applications of Information Theory in Machine Learning

1 Answers1