The Linux Filesystem Cache is Braindead

Say you have 2 GiB of RAM and you copy 2 GiB's worth of data from one disk directory to another. That will effectively flush the Linux filesystem cache, and you don't even have to be root. Anything you want to do afterwards will have to reload any other files needed from disk, which means that the system will always respond slowly after copying large files.

The Linux cache subsystem does not realise that you are copying a set of large files just once, that they would not fit in the cache anyway, and that other often-used files should remain in the cache instead. Tool nocache is supposed to come to the rescue, but it's too brutal, it just turns any caching off for the given operation. As a result, if you copy a large number of small files, it will take forever.

Well-behaved applications should provide an option to use posix_fadvise( POSIX_FADV_NOREUSE ) in order to minimise the impact of large sequential reads. It is however unfortunate that flag POSIX_FADV_NOREUSE is ignored by Linux in this day and age. There was some proposed patch years ago, but it seems it never made it. If you want to look yourself, check out kernel source file mm/fadvise.c. It is not ideal, but you can use POSIX_FADV_DONTNEED instead.

Some people suggest opening files with O_DIRECT instead, but that is problematic:
 * First of all, Linus himself says "There really is no valid reason for EVER using O_DIRECT".
 * A second consideration is that O_DIRECT bypasses caching, while POSIX_FADV_NOREUSE still allows for some caching, which may actually be better, depending on the situation.
 * Yet another aspect is that O_DIRECT does no read-ahead, which can easily kill performance in many simple applications.
 * Furthermore, O_DIRECT is specific to Linux, while posix_fadvise is a POSIX standard that should be more portable.
 * Finally, O_DIRECT demands a certain memory alignment, which can be difficult to achieve in a scripting language like Perl.

Another way to limit file cache usage is to run the program in a memory-limited cgroup. Unfortunately such a limit affects all memory usage within the cgroup, and not just the file cache. The only tool I found to painlessly create a temporary cgroup is systemd-run, and even this way is not without rough edges. See script background.sh in my tools repository for an attempt to make it user friendly.

For more information about the Linux cache behaviour, check out rsync bug 9560 drop-cache option and the nocache tool.

I never saw this cache flushing effect with Microsoft Windows, but I suspect that this issue is not limited to Linux though. Please drop me a line if you know what other OSes are affected.