OK, this is my setup:
FC Switches IBM/Brocade, Switch1 and Switch2, independent fabrics.
Server IBM x3650 M2, 2x QLogic QLE2460, 1 connected to each FC Switch.
Storage IBM DS3524, 2x controllers with 4x FC ports each, but only 2x connected on each.
+-----------------------------------------------------------------------+
| HBA1 Server HBA2 |
+-----------------------------------------------------------------------+
| |
| |
| |
+-----------------------------+ +------------------------------+
| Switch1 | | Switch2 |
+-----------------------------+ +------------------------------+
| | | |
| | | |
| | | |
| | | |
| | | |
+-----------------------------------+-----------------------------------+
| Contr A, port 3 | Contr A, port 4 | Contr B, port 3 | Contr B, port 4 |
+-----------------------------------+-----------------------------------+
| Storage |
+-----------------------------------------------------------------------+
My /etc/multipath.conf is from the IBM redbook for the DS3500, except I use a different setting for prio_callout, IBM uses /sbin/mpath_prio_tpc, but according to http://changelogs.ubuntu.com/changelogs/pool/main/m/multipath-tools/multipath-tools_0.4.8-7ubuntu2/changelog, this was renamed to /sbin/mpath_prio_rdac, which I'm using.
devices {
device {
#ds3500
vendor "IBM"
product "1746 FAStT"
hardware_handler "1 rdac"
path_checker rdac
failback 0
path_grouping_policy multibus
prio_callout "/sbin/mpath_prio_rdac /dev/%n"
}
}
multipaths {
multipath {
wwid xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
alias array07
path_grouping_policy multibus
path_checker readsector0
path_selector "round-robin 0"
failback "5"
rr_weight priorities
no_path_retry "5"
}
}
The output of multipath -ll with controller A as the preferred path:
root@db06:~# multipath -ll
sdg: checker msg is "directio checker reports path is down"
sdh: checker msg is "directio checker reports path is down"
array07 (xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) dm-2 IBM ,1746 FASt
[size=4.9T][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=2][active]
\_ 5:0:1:0 sdd 8:48 [active][ready]
\_ 5:0:2:0 sde 8:64 [active][ready]
\_ 6:0:1:0 sdg 8:96 [failed][faulty]
\_ 6:0:2:0 sdh 8:112 [failed][faulty]
If I change the preferred path using IBM DS Storage Manager to Controller B, the output swaps accordingly:
root@db06:~# multipath -ll
sdd: checker msg is "directio checker reports path is down"
sde: checker msg is "directio checker reports path is down"
array07 (xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) dm-2 IBM ,1746 FASt
[size=4.9T][features=1 queue_if_no_path][hwhandler=0]
\_ round-robin 0 [prio=2][active]
\_ 5:0:1:0 sdd 8:48 [failed][faulty]
\_ 5:0:2:0 sde 8:64 [failed][faulty]
\_ 6:0:1:0 sdg 8:96 [active][ready]
\_ 6:0:2:0 sdh 8:112 [active][ready]
According to IBM, the inactive path should be "[active][ghost]", not "[failed][faulty]".
Despite this, I don't seem to have any I/O issues, but my syslog is being spammed with this every 5 seconds:
Jun 1 15:30:09 db06 multipathd: sdg: directio checker reports path is down
Jun 1 15:30:09 db06 kernel: [ 2350.282065] sd 6:0:2:0: [sdh] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 1 15:30:09 db06 kernel: [ 2350.282071] sd 6:0:2:0: [sdh] Sense Key : Illegal Request [current]
Jun 1 15:30:09 db06 kernel: [ 2350.282076] sd 6:0:2:0: [sdh] <<vendor>> ASC=0x94 ASCQ=0x1ASC=0x94 ASCQ=0x1
Jun 1 15:30:09 db06 kernel: [ 2350.282083] sd 6:0:2:0: [sdh] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
Jun 1 15:30:09 db06 kernel: [ 2350.282092] end_request: I/O error, dev sdh, sector 0
Jun 1 15:30:10 db06 multipathd: sdh: directio checker reports path is down
Jun 1 15:30:14 db06 kernel: [ 2355.312270] sd 6:0:1:0: [sdg] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 1 15:30:14 db06 kernel: [ 2355.312277] sd 6:0:1:0: [sdg] Sense Key : Illegal Request [current]
Jun 1 15:30:14 db06 kernel: [ 2355.312282] sd 6:0:1:0: [sdg] <<vendor>> ASC=0x94 ASCQ=0x1ASC=0x94 ASCQ=0x1
Jun 1 15:30:14 db06 kernel: [ 2355.312290] sd 6:0:1:0: [sdg] CDB: Read(10): 28 00 00 00 00 00 00 00 08 00
Jun 1 15:30:14 db06 kernel: [ 2355.312299] end_request: I/O error, dev sdg, sector 0
Does anyone know how I can get the inactive path to show "[active][ghost]" instead of "[failed][faulty]"? I assume that once I can get that right then the spam in my syslog will end as well.
One final thing worth mentioning is that the IBM redbook doc targets SLES 11 so I'm assuming there's something a little different under Ubuntu that I just haven't figured out yet.
Update: As suggested by Mitch, I've tried removing /etc/multipath.conf, and now the output of multipath -ll looks like this:
root@db06:~# multipath -ll
sdg: checker msg is "directio checker reports path is down"
sdh: checker msg is "directio checker reports path is down"
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxdm-1 IBM ,1746 FASt
[size=4.9T][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
\_ 5:0:2:0 sde 8:64 [active][ready]
\_ round-robin 0 [prio=1][enabled]
\_ 5:0:1:0 sdd 8:48 [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 6:0:1:0 sdg 8:96 [failed][faulty]
\_ round-robin 0 [prio=0][enabled]
\_ 6:0:2:0 sdh 8:112 [failed][faulty]
So its more or less the same, with the same message in the syslog every 5 minutes as before, but the grouping has changed.