The Defragmentation Lie

It has been said for years that files on Linux does not become fragmented so it doesn't need defragmentation. It is not true!. Large files can certainly be fragmented on Linux, especially if they are written to often. Bittorrented files for example. Here is the proof:

$ sudo filefrag big-500mb-file filefrag big-500mb-file: 4316 extents found, perfection would be 3 extents $ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches" # Clears fs caches and forces Linux to read from disk. $ time cat big-500mb-file > /dev/null real 0m24.842s user 0m0.032s sys 0m0.592s

Thats the time it takes to read the whole file sequentially takes when it is heavily fragmented. Compared to how long it would take if the file was not fragmented:

$ cp big-500mb-file 500mb-copy $ sudo filefrag 500mb-copy 500mb-copy: 6 extents found, perfection would be 3 extents $ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches" $ time cat 500mb-copy > /dev/null $ time cat 500mb-copy > /dev/null real 0m6.501s user 0m0.024s sys 0m0.508s

Note that the file is still fragmented, possibly because other IO operations are going on in the background. Three things can be learned from this exercise.

  • Fragmentation does matter!. It took four times as long to read the fragmented file as it did the unfragmented one. The overhead could be even worse for smaller files because the seek time dominates. E.g. a 2mb file in 10 fragments could in worst case be 10 times as slow to read as if it was in one fragment.
  • Bittorrent leaves files in a heavily fragmented state. Likely because thousands of writes are performed to the same file and it is hard to get them all in order. But I don't understand why it could preallocate the files in advance and then write to them?
  • cp can defragment files.