Linux filesystem with inodes close on the disk
- by pts
I'd like to make the ls -laR /media/myfs on Linux as fast as possible. I'll have 1 million files on the filesystem, 2TB of total file size, and some directories containing as much as 10000 files. Which filesystem should I use and how should I configure it?
As far as I understand, the reason why ls -laR is slow because it has to stat(2) each inode (i.e. 1 million stat(2)s), and since inodes are distributed randomly on the disk, each stat(2) needs one disk seek.
Here are some solutions I had in mind, none of which I am satisfied with:
Create the filesystem on an SSD, because the seek operations on SSDs are fast. This wouldn't work, because a 2TB SSD doesn't exist, or it's prohibitively expensive.
Create a filesystem which spans on two block devices: an SSD and a disk; the disk contains file data, and the SSD contains all the metadata (including directory entries, inodes and POSIX extended attributes). Is there a filesystem which supports this? Would it survive a system crash (power outage)?
Use find /media/myfs on ext2, ext3 or ext4, instead of ls -laR /media/myfs, because the former can the advantage of the d_type field (see in the getdents(2) man page), so it doesn't have to stat. Unfortunately, this doesn't meet my requirements, because I need all file sizes as well, which find /media/myfs doesn't print.
Use a filesystem, such as VFAT, which stores inodes in the directory entries. I'd love this one, but VFAT is not reliable and flexible enough for me, and I don't know of any other filesystem which does that. Do you? Of course, storing inodes in the directory entries wouldn't work for files with a link count more than 1, but that's not a problem since I have only a few dozen such files in my use case.
Adjust some settings in /proc or sysctl so that inodes are locked to system memory forever. This would not speed up the first ls -laR /media/myfs, but it would make all subsequent invocations amazingly fast. How can I do this? I don't like this idea, because it doesn't speed up the first invocation, which currently takes 30 minutes. Also I'd like to lock the POSIX extended attributes in memory as well. What do I have to do for that?
Use a filesystem which has an online defragmentation tool, which can be instructed to relocate inodes to the the beginning of the block device. Once the relocation is done, I can run dd if=/dev/sdb of=/dev/null bs=1M count=256 to get the beginning of the block device fetched to the kernel in-memory cache without seeking, and then the stat(2) operations would be fast, because they read from the cache. Is there a way to lock those inodes and/or blocks into memory once they have been read? Which filesystem has such a defragmentation tool?