What's up with LDoms: Part 9 - Direct IO
- by Stefan Hinker
In the last article of this series, we discussed the most general of all physical IO options available for LDoms, root domains. Now, let's have a short look at the next level of granularity: Virtualizing individual PCIe slots. In the LDoms terminology, this feature is called "Direct IO" or DIO. It is very similar to root domains, but instead of reassigning ownership of a complete root complex, it only moves a single PCIe slot or endpoint device to a different domain. Let's look again at hardware available to mars in the original configuration:
root@sun:~# ldm ls-io
NAME TYPE BUS DOMAIN STATUS
---- ---- --- ------ ------
pci_0 BUS pci_0 primary
pci_1 BUS pci_1 primary
pci_2 BUS pci_2 primary
pci_3 BUS pci_3 primary
/SYS/MB/PCIE1 PCIE pci_0 primary EMP
/SYS/MB/SASHBA0 PCIE pci_0 primary OCC
/SYS/MB/NET0 PCIE pci_0 primary OCC
/SYS/MB/PCIE5 PCIE pci_1 primary EMP
/SYS/MB/PCIE6 PCIE pci_1 primary EMP
/SYS/MB/PCIE7 PCIE pci_1 primary EMP
/SYS/MB/PCIE2 PCIE pci_2 primary EMP
/SYS/MB/PCIE3 PCIE pci_2 primary OCC
/SYS/MB/PCIE4 PCIE pci_2 primary EMP
/SYS/MB/PCIE8 PCIE pci_3 primary EMP
/SYS/MB/SASHBA1 PCIE pci_3 primary OCC
/SYS/MB/NET2 PCIE pci_3 primary OCC
/SYS/MB/NET0/IOVNET.PF0 PF pci_0 primary
/SYS/MB/NET0/IOVNET.PF1 PF pci_0 primary
/SYS/MB/NET2/IOVNET.PF0 PF pci_3 primary
/SYS/MB/NET2/IOVNET.PF1 PF pci_3 primary
All of the "PCIE" type devices are available for SDIO, with a few limitations. If the device is a slot, the card in that slot must support the DIO feature. The documentation lists all such cards. Moving a slot to a different domain works just like moving a PCI root complex. Again, this is not a dynamic process and includes reboots of the affected domains. The resulting configuration is nicely shown in a diagram in the Admin Guide:
There are several important things to note and consider here:
The domain receiving the slot/endpoint device turns into an IO domain in LDoms terminology, because it now owns some physical IO hardware.
Solaris will create nodes for this hardware under /devices. This includes entries for the virtual PCI root complex (pci_0 in the diagram) and anything between it and the actual endpoint device. It is very important to understand that all of this PCIe infrastructure is virtual only! Only the actual endpoint devices are true physical hardware.
There is an implicit dependency between the guest owning the endpoint device and the root domain owning the real PCIe infrastructure:
Only if the root domain is up and running, will the guest domain have access to the endpoint device.
The root domain is still responsible for resetting and configuring the PCIe infrastructure (root complex, PCIe level configurations, error handling etc.) because it owns this part of the physical infrastructure.
This also means that if the root domain needs to reset the PCIe root complex for any reason (typically a reboot of the root domain) it will reset and thus disrupt the operation of the endpoint device owned by the guest domain. The result in the guest is not predictable. I recommend to configure the resulting behaviour of the guest using domain dependencies as described in the Admin Guide in Chapter "Configuring Domain Dependencies".
Please consult the Admin Guide in Section "Creating an I/O Domain by Assigning PCIe Endpoint Devices" for all the details!
As you can see, there are several restrictions for this feature. It was introduced in LDoms 2.0, mainly to allow the configuration of guest domains that need access to tape devices. Today, with the higher number of PCIe root complexes and the availability of SR-IOV, the need to use this feature is declining. I personally do not recommend to use it, mainly because of the drawbacks of the depencies on the root domain and because it can be replaced with SR-IOV (although then with similar limitations).
This was a rather short entry, more for completeness. I believe that DIO can usually be replaced by SR-IOV, which is much more flexible. I will cover SR-IOV in the next section of this blog series.