Linux not buffering block I/O when the device is not "in use" (i.e. mounted)
- by Radek Hladík
I am installing new server and I've found an interesting issue. The server is running Fedora 19 (3.11.7-200.fc19.x86_64 kernel) and is supposed to host a few KVM/Qemu virtual servers (mail server, file server, etc..). The HW is Intel(R) Xeon(R) CPU 5160 @ 3.00GHz with 16GB RAM.
One of the most important features will be Samba server and we have decided to make it as virtual machine with almost direct access to the disks. So the real HDD is cached on SSD (via bcache) then raided with md and the final device is exported into the virtual machine via virtio. The virtual machine is again Fedora 19 with the same kernel.
One important topic to find out is whether the virtualization layer will not introduce high overload into disk I/Os. So far I've been able to get up to 180MB/s in VM and up to 220MB/s on real HW (on the SSD disk). I am still not sure why the overhead is so big but it is more than the network can handle so I do not care so much.
The interesting thing is that I've found that the disk reads are not buffered in the VM unless I create and mount FS on the disk or I use the disks somehow.
Simply put:
Lets do dd to read disk for the first time (the /dev/vdd is an old Raptor disk 70MB/s is its real speed):
[root@localhost ~]# dd if=/dev/vdd of=/dev/null bs=256k count=10000 ; cat /proc/meminfo | grep Buffers
2621440000 bytes (2.6 GB) copied, 36.8038 s, 71.2 MB/s
Buffers: 14444 kB
Rereading the data shows that they are cached somewhere but not in buffers of the VM. Also the speed increased to "only" 500MB/s. The VM has 4GB of RAM (more that the test file)
[root@localhost ~]# dd if=/dev/vdd of=/dev/null bs=256k count=10000 ; cat /proc/meminfo | grep Buffers
2621440000 bytes (2.6 GB) copied, 5.16016 s, 508 MB/s
Buffers: 14444 kB
[root@localhost ~]# dd if=/dev/vdd of=/dev/null bs=256k count=10000 ; cat /proc/meminfo | grep Buffers
2621440000 bytes (2.6 GB) copied, 5.05727 s, 518 MB/s
Buffers: 14444 kB
Now lets mount the FS on /dev/vdd and try the dd again:
[root@localhost ~]# mount /dev/vdd /mnt/tmp
[root@localhost ~]# dd if=/dev/vdd of=/dev/null bs=256k count=10000 ; cat /proc/meminfo | grep Buffers
2621440000 bytes (2.6 GB) copied, 4.68578 s, 559 MB/s
Buffers: 2574592 kB
[root@localhost ~]# dd if=/dev/vdd of=/dev/null bs=256k count=10000 ; cat /proc/meminfo | grep Buffers
2621440000 bytes (2.6 GB) copied, 1.50504 s, 1.7 GB/s
Buffers: 2574592 kB
While the first read was the same, all 2.6GB got buffered and the next read was at 1.7GB/s. And when I unmount the device:
[root@localhost ~]# umount /mnt/tmp
[root@localhost ~]# cat /proc/meminfo | grep Buffers
Buffers: 14452 kB
[root@localhost ~]# dd if=/dev/vdd of=/dev/null bs=256k count=10000 ; cat /proc/meminfo | grep Buffers
2621440000 bytes (2.6 GB) copied, 5.10499 s, 514 MB/s
Buffers: 14468 kB
The bcache was disabled while testing and the results are same on faster (newer) HDDs and on SSD (except for the initial read speed of course).
To sum it up. When I read from the device via dd first time, it gets read from the disk. Next time I reread it gets cached in the host but not in the guest (thats actually the same issue, more on that later). When I mount the filesystem but try to read the device directly it gets cached in VM (via buffers). As soon as I stop "using" it, buffers are discarded and the device is not cached anymore in the VM.
When I looked into buffers value on the host I realized that the situation is the same. The block I/O gets buffered only when the disk is in use, in this case it means "exported to a VM".
On host, after all the measurement done:
3165552 buffers
On the host, after the VM shutdown:
119176 buffers
I know it is not important as the disks will be mounted all the time but I am curious and I would like to know why it is working like this.