Best Practices - which domain types should be used to run applications
- by jsavit
This post is one of a series of "best practices" notes for Oracle VM Server for SPARC (formerly named Logical Domains)
One question that frequently comes up is "which types of domain should I use to run applications?"
There used to be a simple answer in most cases: "only run applications in guest domains", but
enhancements to T-series servers, Oracle VM Server for SPARC and the advent of SPARC SuperCluster have made this question
more interesting and worth qualifying differently.
This article reviews the relevant concepts and provides suggestions on where to deploy applications in a logical domains environment.
Review: division of labor and types of domain
Oracle VM Server for SPARC offloads many functions from the hypervisor to domains (also called virtual machines).
This is a modern alternative to using a "thick" hypervisor that provides all virtualization functions, as in traditional VM designs,
This permits a simpler hypervisor design, which enhances
reliability, and security. It also reduces single points of failure by assigning responsibilities
to multiple system components, which further improves reliability and security.
In this architecture, management and I/O functionality are provided within domains.
Oracle VM Server for SPARC does this by defining the following types of domain, each with their own roles:
Control domain - management control point for the server, used to configure domains and manage resources.
It is the first domain to boot on a power-up, is an I/O domain, and is usually a service domain as well.
I/O domain - has been assigned physical I/O devices: a PCIe root complex, a PCI device, or a
SR-IOV (single-root I/O Virtualization) function. It has native performance and functionality for the devices it owns, unmediated by any virtualization layer.
Service domain - provides virtual network and disk devices to guest domains.
Guest domain - a domain whose devices are all virtual rather than physical: virtual network and disk devices provided by one or more service domains. In common practice, this is where applications are run.
Typical deployment
A service domain is generally also an I/O domain: otherwise it wouldn't have access to physical device "backends" to offer to its clients.
Similarly, an I/O domain is also typically a service domain in order to leverage the available PCI busses.
Control domains must be I/O domains, because they boot up first on the server and require physical I/O.
It's typical for the control domain to also be a service domain too so it doesn't "waste" the I/O resources it uses.
A simple configuration consists of a control domain, which is also the one I/O and service domain, and some number of
guest domains using virtual I/O. In production, customers typically use multiple domains with I/O and service roles
to eliminate single points of failure: guest domains have virtual disk and virtual devices provisioned from more than one service domain, so failure of a service domain or I/O path or device doesn't result in an application outage.
This is also used for "rolling upgrades" in which service domains are upgraded one at a time while their guests continue to
operate without disruption.
(It should be noted that resiliency to I/O device failures can also be provided by the single control domain, using multi-path I/O)
In this type of deployment, control, I/O, and service domains are used for virtualization infrastructure,
while applications run in guest domains.
Changing application deployment patterns
The above model has been widely and successfully used, but more configuration options are available now.
Servers got bigger than the original T2000 class machines with 2 I/O busses,
so there is more I/O capacity that can be used for applications.
Increased T-series server capacity made it attractive to run more vertical applications, such as databases,
with higher resource requirements than the "light" applications originally seen.
This made it attractive to run applications in I/O domains so they could get bare-metal native I/O performance.
This is leveraged by the SPARC SuperCluster engineered system,
announced a year ago at Oracle OpenWorld.
In SPARC SuperCluster, I/O domains are used for high performance applications, with native I/O performance for disk and network
and optimized access to the Infiniband fabric.
Another technical enhancement is the introduction of Direct I/O (DIO) and Single Root I/O Virtualization (SR-IOV),
which make it possible to give domains direct connections and native I/O performance for selected I/O devices.
A domain with either a DIO or SR-IOV device is an I/O domain.
In summary: not all I/O domains own PCI complexes, and there are increasingly more I/O domains that are not service domains.
They use their I/O connectivity for performance for their own applications.
However, there are some limitations and considerations: at this time, a domain using physical I/O cannot be live-migrated to another server.
There is also a need to plan for security and introducing unneeded dependencies: if an I/O domain is also a service
domain providing virtual I/O go guests, it has the ability to affect the correct operation of its client guest domains.
This is even more relevant for the control domain. where the ldm has to be protected from unauthorized (or even mistaken) use
that would affect other domains. As a general rule, running applications in the service domain or
the control domain should be avoided.
To recap:
Guest domains with virtual I/O still provide the greatest operational flexibility, including features like live migration.
I/O domains can be used for applications with high performance requirements. This is used to great effect in SPARC SuperCluster and in general T4 deployments. Direct I/O (DIO) and Single Root I/O Virtualization (SR-IOV) make this more attractive by giving direct I/O access to more domains.
Service domains should in general not be used for applications, because compromised security in the domain, or an outage, can affect other domains that depend on it. This concern can be mitigated by providing guests' their virtual I/O from more than one service domain, so an interruption of service in the service domain does not cause an application outage.
The control domain should in general not be used to run applications, for the same reason.
SPARC SuperCluster use the control domain for applications, but it is an exception: it's not a general purpose
environment; it's an engineered system with specifically configured applications and optimization for optimal performance.
These are recommended "best practices" based on conversations with a number of Oracle architects.
Keep in mind that "one size does not fit all", so you should evaluate these practices in the context of your own requirements.
Summary
Higher capacity T-series servers have made it more attractive to use them for applications with high resource requirements.
New deployment models permit native I/O performance for demanding applications by running them in I/O domains
with direct access to their devices. This is leveraged in SPARC SuperCluster, and can be leveraged in T-series servers
to provision high-performance applications running in domains. Carefully planned, this can be used to provide higher performance
for critical applications.