Best Practices - Core allocation
- by jsavit
This post is one of a series of "best practices" notes for Oracle VM Server for SPARC
(also called Logical Domains)
Introduction
SPARC T-series servers currently have up to 4 CPU sockets, each of which has up to 8
or (on SPARC T3) 16 CPU cores, while each CPU core has 8 threads, for a maximum of 512
dispatchable CPUs.
The defining feature of Oracle VM Server for SPARC is that each domain is assigned
CPU threads or cores for its exclusive use. This avoids the overhead of software-based
time-slicing and emulation (or binary rewriting) of system state-changing privileged instructions used in traditional hypervisors.
To create a domain, administrators specify either the number of CPU threads or cores that the domain
will own, as well as its memory and I/O resources.
When CPU resources are assigned at the individual thread level,
the logical domains constraint manager attempts
to assign threads from the same cores to a domain,
and avoid "split core" situations where the same CPU core is used by multiple domains.
Sometimes this is unavoidable, especially when domains are allocated and deallocated CPUs in
small increments.
Why split cores can matter
Split core allocations can silenty reduce performance because multiple domains
with different address spaces and memory contents are sharing the core's Level 1 cache (L1$).
This is called false cache sharing since even identical memory addresses from
different domains must point to different locations in RAM. The effect of this is
increased contention for the cache, and higher memory latency for each domain using that core.
The degree of performance impact can be widely variable. For applications with very small
memory working sets, and with I/O bound or low-CPU utilization workloads, it may not matter at all:
all machines wait for work at the same speed.
If the domains have substantial workloads, or are critical
to performance then this can have an important impact:
This blog entry was inspired by a customer issue in which one CPU core was split
among 3 domains, one of which was the control and service domain.
The reported problem was increased I/O latency in guest domains, but the root cause might
be higher latency servicing the I/O requests due to the control domain being slowed down.
What to do about it
Split core situations are easily avoided. In most cases the logical domain constraint manager
will avoid it without any administrative action, but it can be entirely prevented by doing one of the several actions:
Assign virtual CPUs in multiples of 8 - the number of threads per core.
For example: ldm set-vcpu 8 mydomain or ldm add-vcpu 24 mydomain.
Each domain will then be allocated on a core boundary.
Use the whole core constraint when assigning CPU resources. This allocates
CPUs in increments of entire cores instead of virtual CPU threads.
The equivalent of the above commands would be
ldm set-core 1 mydomain or ldm add-core 3 mydomain. Older syntax
does the same thing by adding the -c flag to the add-vcpu,
rm-vcpu and set-vcpu commands, but the new syntax is recommended.
When whole core allocation is used an attempt to add cores to a domain fails if there aren't
enough completely empty cores to satisfy the request.
See https://blogs.oracle.com/sharakan/entry/oracle_vm_server_for_sparc4 for an excellent article
on this topic by Eric Sharakan.
Don't obsess: - if the workloads have minimal CPU requirements and don't need anywhere
near a full CPU core, then don't worry about it. If you have low utilization workloads being
consolidated from older machines onto a current T-series, then there's no need to worry about this
or to assign an entire core to domains that will never use that much capacity.
In any case, make sure the most important domains have their own CPU cores, in
particular the control domain and any I/O or service domain, and of course any important guests.
Summary
Split core CPU allocation to domains can potentially have an impact on performance,
but the logical domains manager tends to prevent this situation,
and it can be completely and simply avoided by allocating virtual CPUs on
core boundaries.