How is information theory applied to machine learning, and in particular to deep learning, in practice? I'm more interested in concepts that yielded concrete innovations in ML, rather than theoretical constructions.
Note that, I'm aware that basic concepts such as entropy is used for training decision trees, and so on. I'm looking for applications which use slightly more advanced concepts from information theory, whatever they are.