Why do I see a large performance hit with DRBD?
- by BHS
I see a much larger performance hit with DRBD than their user manual says I should get. I'm using DRBD 8.3.7 (Fedora 13 RPMs).
I've setup a DRBD test and measured throughput of disk and network without DRBD:
dd if=/dev/zero of=/data.tmp bs=512M count=1 oflag=direct
536870912 bytes (537 MB) copied, 4.62985 s, 116 MB/s
/ is a logical volume on the disk I'm testing with, mounted without DRBD
iperf:
[ 4] 0.0-10.0 sec 1.10 GBytes 941 Mbits/sec
According to Throughput overhead expectations, the bottleneck would be whichever is slower, the network or the disk and DRBD should have an overhead of 3%. In my case network and I/O seem to be pretty evenly matched. It sounds like I should be able to get around 100 MB/s.
So, with the raw drbd device, I get
dd if=/dev/zero of=/dev/drbd2 bs=512M count=1 oflag=direct
536870912 bytes (537 MB) copied, 6.61362 s, 81.2 MB/s
which is slower than I would expect. Then, once I format the device with ext4, I get
dd if=/dev/zero of=/mnt/data.tmp bs=512M count=1 oflag=direct
536870912 bytes (537 MB) copied, 9.60918 s, 55.9 MB/s
This doesn't seem right. There must be some other factor playing into this that I'm not aware of.
global_common.conf
global {
usage-count yes;
}
common {
protocol C;
}
syncer {
al-extents 1801;
rate 33M;
}
data_mirror.res
resource data_mirror {
device /dev/drbd1;
disk /dev/sdb1;
meta-disk internal;
on cluster1 {
address 192.168.33.10:7789;
}
on cluster2 {
address 192.168.33.12:7789;
}
}
For the hardware I have two identical machines:
6 GB RAM
Quad core AMD Phenom 3.2Ghz
Motherboard SATA controller
7200 RPM 64MB cache 1TB WD drive
The network is 1Gb connected via a switch. I know that a direct connection is recommended, but could it make this much of a difference?
Edited
I just tried monitoring the bandwidth used to try to see what's happening. I used ibmonitor and measured average bandwidth while I ran the dd test 10 times. I got:
avg ~450Mbits writing to ext4
avg ~800Mbits writing to raw device
It looks like with ext4, drbd is using about half the bandwidth it uses with the raw device so there's a bottleneck that is not the network.