3

I have a directory on Ubuntu that contains 262144 files. Each file is a 25x25 png image. On average these files are approximately 1.14kb and no file is larger than 2kb. Yet this directory is using up 3.1GB of disk space. How is this possible? By my calculations this directory should be using 262144 * 1140 = 298844160 bytes, which is only 0.298844 GB.

Here are the steps I followed to get this information.

I ran ls -1 -f | wc -l to count the number of files in the directory. This returns 262146 (i.e., 262144 + 1 + 1 for . and ..).

Next I ran find . -size +2k and the result was just ..

Finally I ran du -sh culprit_directory and the result shows 3.1G culprit_directory.

There are two things that I imagine could be happening:

  1. Ubuntu needs extra space to store a directory that contains a very large number of very small files. Possible, but a whole order of magnitude?
  2. I am making a mistake in my calculations and this is the expected size for the directory. Also possible, but I am unable to see where I made this mistake.

If anyone with more experience with Ubuntu's internal file storage could advise me I would greatly appreciate it.

EDIT: I have added one of the png files. This one is 591 bytes in size.

An example of one of the png files

EDIT:

Thanks to muru's helpful comments below, I have determined that each file is actually using 12KB on disk, even if it only consists of a few hundred bytes. Using the new numbers, we get 262144 * 12000 = 3145728000, which gives us 3.145728GB.

I guess my new question would be how to avoid each file using so much space?

casper
  • 111
  • 3
    Try ls -A | wc -l, and du -h --max-depth 1 culprit-directory | sort -h | tail. – muru Nov 01 '14 at 19:35
  • muru, the requested outputs are 262144 and 3.1G culprit_directory (the latter took a while to finish). – casper Nov 01 '14 at 19:40
  • check your trash, if deleted files (related to your directory) are present there , remove them. – hunch Nov 01 '14 at 19:45
  • hunch, the trash is empty. – casper Nov 01 '14 at 19:47
  • 393216kb = about 3.1Gb and only 0.393216GB are you sure it's GB and not Gb. – mchid Nov 01 '14 at 19:48
  • 1
    @mchid du -h should use GB and not Gb. – muru Nov 01 '14 at 19:49
  • muru, unfortunately this gives me the message: bash: /bin/mv: Argument list too long – casper Nov 01 '14 at 19:52
  • 1
    Don't test that @casper I also test it for small ~4000 files. You are right, the total size of files was ~4MB but it give me ~49MB – αғsнιη Nov 01 '14 at 19:56
  • KasiyA, do you know how that happens? My problem is that it is actually using 3.1GB on disk, even though by all my calculations it should be one-tenth of this. I actually need to do this for 20 directories and I do not have 60GB of storage available to me. – casper Nov 01 '14 at 20:00
  • This might be it: http://unix.stackexchange.com/questions/62049/why-are-text-files-4kb (you can test it: du -h culprit-directory/* | head should give a 4.0K as the size for all of them.) – muru Nov 01 '14 at 20:01
  • @casper Actually I don't know why this happens :P ummmm ... I don't have any idea... – αғsнιη Nov 01 '14 at 20:02
  • 2
    muru, thanks for the helpful link. But if all my files were using 4.0K, that still does not explain the 10x size increase. When I run that command I get bash: /usr/bin/du: Argument list too long again :( – casper Nov 01 '14 at 20:05
  • This is very interesting... When I copy four files out of the culprit-directory and run @muru's command, I get 12K for each of the four files. That might explain it, but where is the 12K coming from? – casper Nov 01 '14 at 20:08
  • @casper you have way too any files for * to work. :) Try du -ha cuplrit | head – muru Nov 01 '14 at 20:08
  • @muru, you are correct, it is showing 12K for all the files! – casper Nov 01 '14 at 20:09
  • 1
    Test your block size: sudo dumpe2fs /dev/sdaX | grep 'Block size' - That's assuming you have an ext4 filesystem. /dev/sdaX is the partition which contains culprit-directory. – muru Nov 01 '14 at 20:11
  • @muru, if I understand you correctly, I ran df -h to see which partition culprit_directory is on. It is on /dev/sda7. The output of your command is then: dumpe2fs 1.42.9 (4-Feb-2014) Block size: 4096 – casper Nov 01 '14 at 20:17
  • That is not surprising I suppose. The block size is nearly always 4K. What about other sizes (try dumpe2fs -h /dev/sda7)? – muru Nov 01 '14 at 20:25
  • @muru, that gives a whole lot of output. I am not familiar enough to say what is relevant, but here is my best guess: Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 1018 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 Flex block group size: 16 – casper Nov 01 '14 at 20:28
  • Everything looks normal and I am stumped. This is beyond my level. ONe last thing, try fdisk -l /dev/sda. The sector size should be 512 bytes. – muru Nov 01 '14 at 20:31
  • @muru thanks so much for your help so far anyway. If you want to write your comments as an answer I'll mark it as accepted. – casper Nov 01 '14 at 20:33
  • @muru yes indeed: Sector size (logical/physical): 512 bytes / 4096 bytes – casper Nov 01 '14 at 20:34
  • 1
    Nah, I'd like to keep this open in case an expert does happen to come by. :) – muru Nov 01 '14 at 20:37

2 Answers2

3

Answering the follow-up question, probably the easiest thing to do in these cases is to create a small ad-hoc filesystem and loop-mount it. Something like this:

$ dd if=/dev/zero of=imgdisk.img bs=1M count=512
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 0.425628 s, 1.3 GB/s
$ du -h imgdisk.img 
513M    imgdisk.img
$ mkfs.ext4 -b 2048 imgdisk.img 
mke2fs 1.42.12 (29-Aug-2014)
Discarding device blocks: done                            
Creating filesystem with 262144 2k blocks and 32768 inodes
Filesystem UUID: 8837a733-6b75-4326-bb72-9372538653ad
Superblock backups stored on blocks: 
        16384, 49152, 81920, 114688, 147456

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done 

$ mkdir imgmount
$ sudo mount imgdisk.img imgmount -o loop
$ ls imgmount/
lost+found

copy the images in there (sized as it is it might be just too small for your files; make the 512 a 513 if so), umount the loop filesystem, mount it over the directory with all the images. If that works, umount it from there, delete the original images, edit /etc/fstab so it mounts the loopback filesystem in the right place, mount -a and you're set.

Edit: you can use -b 1024 instead of 2048, also.

Chipaca
  • 9,990
  • Thanks for your answer. I'm afraid I haven't yet had a chance to try this, but I will try to get to it soon! – casper Nov 03 '14 at 20:41
1

Different pieces of software report different disk space usage statistics in 2 ways. Files can have a size that is the number of bytes in a file, and a "physical size" which is the sum of the cluster sizes used by that file. The physical size, or "cluster size" is the minimum chunk the OS keeps track of when referring to space used on a disk. So if you have one file containing 1 byte, and your cluster size is 8 kilobytes, the file uses a minimum of 1 cluster, or 8kb.

Wasted disk space really isn't normally a problem, even as disk sizes increase and clusters sizes increase to 32kb or 64kb.

 guess my new question would be how to avoid each file using so much space?

Put files that are used less into an archive file, like a .zip file. The OS has a limit on how many clusters it can keep track of.

Maybe someone else can explain the limit on cluster sizes for several OSs.

Bulrush
  • 772
  • Thanks @Bulrush, but I can't zip these files. I have a program that needs access to all of them at the same time. – casper Nov 01 '14 at 20:54
  • If you see what @Chipaca did, he simply created a filesystem with cluster sizes of 2kb (called "block sizes" here). See the step with 'mkfs.ext4', it says "Creating filesystem with 262144 2k blocks and..." – Bulrush Nov 03 '14 at 20:20