It's not the case this time (see the accepted answer), but sometimes the extra overhead of archiving and compression can result in a larger archive than the original content.
This is true when there is extremely high entropy, such as a directory filled with files of random text and/or media.
Example 1: Random data
$ dd if=/dev/urandom of=test bs=1M count=100
$ tar -zcf test.tgz test
$ tar -cf test.tar test
$ gzip -ck --best test.tar > test-best.tar.gz
$ gzip -ck --fast test.tar > test-fast.tar.gz
$ xz -ck --fast test.tar >test.tar.xz
$ xz --fast -ck test >test.xz
$ gzip --best -ck test >test.gz
$ bzip2 --best -ck test >test.bz2
$ ls -lS test*
-rw-r--r-- 1 adamhotep adamhotep 105326395 Oct 7 16:52 test.bz2
-rw-r--r-- 1 adamhotep adamhotep 104875661 Oct 7 16:49 test-fast.tar.gz
-rw-r--r-- 1 adamhotep adamhotep 104875661 Oct 7 16:48 test.tar.gz
-rw-r--r-- 1 adamhotep adamhotep 104874474 Oct 7 16:49 test-best.tar.gz
-rw-r--r-- 1 adamhotep adamhotep 104874206 Oct 7 16:51 test.gz
-rw-r--r-- 1 adamhotep adamhotep 104867840 Oct 7 16:48 test.tar
-rw-r--r-- 1 adamhotep adamhotep 104864052 Oct 7 16:50 test.tar.xz
-rw-r--r-- 1 adamhotep adamhotep 104862868 Oct 7 16:50 test.xz
-rw-r--r-- 1 adamhotep adamhotep 104857600 Oct 7 16:47 test
This created a random 100M file and then archived and compressed it in several different ways. The results are sorted by size (biggest first). As you can see, the overhead from the tarball containers and compression headers is large and there's a distinct lack of patterns to compress.
The original random file is unsurprisingly the smallest here.
(I used -ck
and piped the output of the compression commands so you can more clearly see what output file it created. This was superfluous.)
Example 2: Video+Audio data
$ youtube-dl -o test.mp4 'https://www.youtube.com/watch?v=dQw4w9WgXcQ'
[youtube] dQw4w9WgXcQ: Downloading webpage
[youtube] dQw4w9WgXcQ: Downloading video info webpage
[youtube] dQw4w9WgXcQ: Extracting video information
[youtube] dQw4w9WgXcQ: Downloading js player en_US-vflOj6Vz8
[download] Destination: test.mp4
[download] 100% of 56.64MiB in 00:07
$ gzip --best -ck test.mp4 >test.mp4.gz
$ xz --fast -ck test.mp4 >test.mp4.xz
$ ls -lS test.mp4*
-rw-r--r-- 1 adamhotep adamhotep 59388616 Oct 7 16:52 test.mp4
-rw-r--r-- 1 adamhotep adamhotep 59332683 Oct 7 16:52 test.mp4.gz
-rw-r--r-- 1 adamhotep adamhotep 59320572 Oct 7 16:52 test.mp4.xz
I repeated the gzip and xz tests for this test video. There was enough metadata to just barely shrink it with compression (xz can save 68k, a whopping 0.1%!). I suspect this has to do with the cues .mp4 leaves to ensure proper streaming and audio-visual sync. This particular video lacks subtitles.
In short, don't compress random or compressed data.
tar -z
is gzipped compression. I'll put in an edit suggestion to clarify. [No I won't. There's another edit suggestion already pending, so I can't.] – TRiG Oct 07 '16 at 11:50.git
hidden folder – ArtificiallyIntelligent Sep 17 '19 at 18:50.git
folder – desmond13 Apr 15 '20 at 14:34