2

I want all my files in an ext4 filesystem not fragmented because of reasons. Sadly e4defrag (advised here: How to defrag an ext4 filesystem) fails to defragment several files. What are my alternatives?

The filesystem has all files in it already (they are not to be changed in any way) and it is almost full. There are several free blocks (according to df -h: 434M available of 85G and 80G used) which can be used as buffer. I do not need the filesystem mounted while defragmenting. Moreover, I have other filesystem available with enough space to use as a buffer.

One idea I have is to move the files to other filesystem and then copy them back somehow telling the filesystem to store them contiguously.

[EDIT]

I have just found that I cannot rely on e4defrag output. It counts files with more than one extent as fragmented, while it knows that the extents are contiguous:

$ sudo filefrag file.file
file.file: 1 extent found
$ sudo e4defrag -vc file.file
e4defrag 1.45.5 (07-Jan-2020)
<File>
[ext 1]: start 22388736: logical 0: len 32768
[ext 2]: start 22421504: logical 32768: len 32768
[ext 3]: start 22454272: logical 65536: len 32768
[ext 4]: start 22487040: logical 98304: len 27962

Total/best extents 4/1  Average size per extent 126266 KB  Fragmentation score 0  [0-30 no problem: 31-55 a little bit fragmented: 56- needs defrag]  This file (file.file) does not need defragmentation.  Done.

abukaj
  • 465
  • 5
    There is no need in defragmentation in ext4. – Mahler Nov 13 '22 at 12:08
  • Is the purpose of this to shrink an ext4 virtual disk image? – matigo Nov 13 '22 at 12:22
  • Does this answer your question? How to defrag an ext4 filesystem – graham Nov 13 '22 at 12:23
  • @24601 No, I have found that some time ago. e4defrag fails and I cannot use gparted resize trick due to lack of space on the device. – abukaj Nov 13 '22 at 12:33
  • 1
    @Mahler then what is your way to have all files contiguous? – abukaj Nov 13 '22 at 12:38
  • I have ssd and defragmentation is bad. And for hdd, defragmentation is needed only if there are problems. – Mahler Nov 13 '22 at 12:52
  • @Mahler The problem is that ‘problems’ are very much contextual. Defragmenting files on ext4 can in fact give measurably better performance for sequential or large reads, even on flash storage (making thousands of small requests is still more expensive than making one big one, even if you have no seek time). – Austin Hemmelgarn Nov 14 '22 at 02:31
  • 8
    This sounds very much like an XY problem. If you can explain why you think you need exactly zero fragmentation, people can probably give you a better answer. Based on your comments about the filesystem reasonably being read-only though, ext4 is probably not the best choice here, and you should be looking at either SquashFS or EROFS instead (or possibly CramFS, but that’s inferior to either SquashFS or EROFS in most ways). – Austin Hemmelgarn Nov 14 '22 at 02:35
  • A solid-state drive (SSD) does not have a head for writing and reading, so defragmentation does not make sense. In SSD, data is stored on memory chips, so it can be extracted much faster. – Mahler Nov 14 '22 at 16:06
  • @AustinHemmelgarn Thanks for the suggestion - I will look into that if I manage to get rid of the ext4 requirement and not the no fragmentation requirement ("reasons"). But my question is purely technical here: "how to do Y". If I wanted to have X solved, I would have asked for it explicitly. – abukaj Nov 15 '22 at 08:35
  • @Mahler http://www.hanselman.com/blog/the-real-and-complete-story-does-windows-defragment-your-ssd – abukaj Nov 15 '22 at 08:39
  • 2
    @Mahler Each individual IO request still has a cost in the OS. It’s not ‘free’ to ask the drive for data, the OS has to set up the region of memory which the data will be transferred to, actually send the request to the storage device (which can be very time consuming depending on how it’s connected, wait for completion (yes, there is a wait, even for an SSD), and then once it has the data clean everything up. That overhead is per-request, so by issuing larger requests you get lower overhead and faster bulk data transfers. – Austin Hemmelgarn Nov 15 '22 at 12:47
  • When working with SSD drives, modern versions of Windows disable defragmentation by default, using the TRIM function instead. – Mahler Nov 15 '22 at 20:11
  • There is fstrim utility on Linux. – Mahler Nov 15 '22 at 20:19
  • @Mahler How does TRIM fixes the issue of maximum file fragmentation? – abukaj Nov 16 '22 at 11:51
  • TRIM is used instead defrag for SSD. – Mahler Nov 16 '22 at 14:49
  • @Mahler TRIM erases SSD pages. How does it keep file fragmentation below maximum? I mean: If an SSD gets too fragmented you can hit maximum file fragmentation (when the metadata can’t represent any more file fragments) which will result in errors when you try to write/extend a file. Not to mention that more file fragments means more metadata to process while reading/writing a file, which can lead to slower performance. – abukaj Nov 16 '22 at 14:56
  • I don't use defragment or TRIM. My SSD is half full, no problems have been noticed yet. – Mahler Nov 16 '22 at 15:11
  • Possibly Ubuntu does it automatically. – Mahler Nov 16 '22 at 15:19

3 Answers3

12

I want all my files in an ext4 filesystem not fragmented because of reasons.

While there are legitimate reasons to defrag, none require every single file to be defragged and contiguous. The main reasons anyone might want every file to be defragged are OCPD related, which is a complete waste of time because the file system will become "fragmented" again shortly after being mounted rw.

The filesystem... is almost full...

In that scenario, you probably will not be able to defrag every file because Linux defrag programs tend to work at the file level and you do not necessarily have enough contiguous free space to defrag every file.

One idea I have is to move the files to other filesystem and then copy them back...

That is your most viable option. However, specific file allocation is determined by the file system driver.


To reorder blocks it is enough to have one free block as a buffer.

Linux file system devs have not given defrag the same priority that Windows devs have. So the problem is not so much that it is technically impossible, but that no one has bothered to write any programs to do so.

The fs may be set to ro after defragmentation.

Then use a filesystem designed for ro use, like squashfs. All files will be defragged, contiguous, and even compressed.

xiota
  • 4,849
  • "the file system will become "fragmented" again shortly after being mounted rw" As I mentioned, no file is going to be changed. The fs may be set to ro after defragmentation. "will not be able to defrag every file" May you elaborate why is it so? To reorder blocks it is enough to have one free block as a buffer. "That is your most viable option." How do I tell the ext4 not to leave empty blocks between files? – abukaj Nov 13 '22 at 13:58
  • 1
    Edited to answer some of your questions. – xiota Nov 13 '22 at 14:09
4

If some of your files are big, it might be technically impossible to defragment them all without reformating the filesystem.

Any ext4 filesystem is composed of a sequence of block groups. By default, each block group is 128 MiB long.

Each block group starts with a bunch of filesystem metadata (superblock, group descriptor, allocation bitmaps, and inode tables) followed by actual data blocks used by files belonging to that block group. This means that filesystem metadata are scattered mostly uniformly across the entire device.

However, thanks to the optional flex_bg feature, several block groups can be aggregated together into a single bigger one. mke2fs has been creating filesystems by default with 16 block groups packed together since 2008-ish. Assuming you haven't changed this when making the filesystem using the -G option to mkfs, your filesystem is thus likely split into 2-GiB flex groups.

Unless all your files are significantly smaller than 2 GiB, you would thus inevitably run into a situation where the next file to store would have to be fragmented across two or more (flex) block groups. Of course this is guaranteed to happen if any of your files is bigger than the usable data blocks in a (flex) block group.

To achieve your goal, you will thus likely have to reformat the filesystem with a much higher setting of the -G option than the default 16 to make the filesystem use really big flex block groups.

TooTea
  • 156
  • Now I wonder why e4defrag reports "success" for files 900M+. ;) – abukaj Nov 13 '22 at 21:23
  • 1
    @abukaj Not sure what exactly you mean by "reporting success", e4defrag can print several different messages meaning different things. But in general, e4defrag is smart enough to calculate the "best possible number of fragments a given file could have with a given block group size" and uses that to find if a file is worth trying to defragment further. (See the source of get_best_count().) – TooTea Nov 13 '22 at 22:55
  • @abukaj After further checking of the code, I must conclude that my original version of this answer was wrong. It's actually perfectly possible to make a filesystem with huge flex block groups using the standard tools (without any hacking). You might want to un-accept this answer if it doesn't apply anymore. – TooTea Nov 14 '22 at 09:25
  • I meant output like:
    e4defrag 1.45.5 (07-Jan-2020) <NL>
    ext4 defragmentation for 986M.file <NL>
    [1/1]986M.file: 100% [ OK ] <NL>
     Success:   [1/1]```
    Meanwhile I ran `sudo e4defrag -v .` and had 986M.file listed as a fragmented file (now/best = 1/1)...
    
    – abukaj Nov 15 '22 at 09:07
0

Back before DOS 6, the usual advice for defragmenting a FAT partition was:

  1. Copy the files from the partition to something else;
  2. Wipe the partition;
  3. Re-create the directory structure on the empty partition; and
  4. Copy the files back.

I never tried this because MS-DOS 6 came out (with its included defrag utility) before defragmenting came to be an issue for me.

MDeBusk
  • 1,175
  • 2
  • 8
  • 21
  • 2
    The major problem with that approach is that "Linux does not need defragmentation" or rather that its filesystems are designed to avoid fragmentation. That means that while FAT fs lines files as close to each other as possible (which leads to file fragmentation if you append to it), extX scatters files around the disk leaving free blocks in case they grow. That means as you approach the filesystem capacity (which should be avoided in most circumstances), your copied files start to be fragmented. – abukaj Nov 13 '22 at 16:23
  • @abukaj I'm sure you're correct. And I haven't looked into it deeply enough to offer more than the paraphrased "Four Yorkshiremen" skit. ("You had a File Allocation Table? Luxury!") Seems to me that if conventional tools aren't giving you what you want, though, and you insist on ext4, the old-school approach is something you can try.

    Now that I think of it, because you don't intend to change these files, what if you created a UDF-formatted disk image, moved the files there, and mounted it when you needed to? Would that be too slow?

    – MDeBusk Nov 13 '22 at 20:51
  • I think I will either drop ext4 or the "no fragmentation" requirement, whichever is easier (see the accepted answer). BTW: I have not known the original skit. – abukaj Nov 13 '22 at 21:24
  • 2
    @abukaj The Four Yorkshiremen is one of Monty Python's more famous sketches. Four well-off men trying to one-up one another over their difficult childhhoods. – MDeBusk Nov 13 '22 at 23:00