Can a data compression function be used to make predictions?

Question

I've heard that prediction is equivalent to data compression. Is there a way to take a compression function and use it to create an AI that predicts?

score 2 · Answer 1 · answered Jul 19 '19 at 08:11

The way some (not all) compression algorithms work is that they encode frequent events in a short code, and rarer events with a longer code. Overall you save more space by encoding the common elements than you need to expend coding the rare ones. One example of this is a Huffman code, which uses a variable length encoding based on the frequency of the items.

You can use a compression algorithm for prediction if if encodes more than one event at a time. For example, word pairs rather than individual words. Each word pair will have a code, and the common word pairs (eg of the) will have shorter codes that the ones which are less common (eg of three). For prediction, select all the word pairs that start with your known sequence (eg of). Now select from that list the pair with the shortest code (which is more common), so in this example of would more likely be followed by the rather than three. After than, repeat the process with the next word, so look for pairs that begin with the.

All you need is the compression 'code book' which is produced during the compression process -- it's essentially a model of the data you compressed. This also works for longer sequences than pairs, of course.

If you want to know more about the topic, I can recommend Managing Gigabytes by Witten, Moffat, and Bell. Great book on compression techniques.

score 0 · Answer 2 · answered Apr 03 '20 at 00:59

Although this does not strictly answer your question (but it is at least very related), Jürgen Schmidhuber has some interesting ideas about compression and how it relates to artificial intelligence, prediction, curiosity, etc. For example, have a look at this paper Simple Algorithmic Theory of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes, whose abstract states

In this summary of previous work, I argue that data becomes temporarily interesting by itself to some self-improving, but computationally limited, subjective observer once he learns to predict or compress the data in a better way, thus making it subjectively more "beautiful". Curiosity is the desire to create or discover more non-random, non-arbitrary, "truly novel", regular data that allows for compression progress because its regularity was not yet known. This drive maximizes "interestingness", the first derivative of subjective beauty or compressibility, that is, the steepness of the learning curve. It motivates exploring infants, pure mathematicians, composers, artists, dancers, comedians, yourself, and recent artificial systems.

See also his interesting talk Juergen Schmidhuber: The Algorithmic Principle Beyond Curiosity and Creativity on YouTube.

Can a data compression function be used to make predictions?

2 Answers2