Best Practices - Dynamic Reconfiguration
- by jsavit
This post is one of a series of "best practices" notes for Oracle VM Server for SPARC (formerly named Logical Domains)
Overview of dynamic Reconfiguration
Oracle VM Server for SPARC supports Dynamic Reconfiguration (DR), making it possible to add or remove
resources to or from a domain (virtual machine) while it is running.
This is extremely useful because resources can be shifted to or from virtual machines in response to
load conditions without having to reboot or interrupt running applications.
For example, if an application requires more CPU capacity,
you can add CPUs to improve performance, and remove them when they are no longer needed.
You can use even use Dynamic Resource Management (DRM) policies that automatically add
and remove CPUs to domains based on load.
How it works (in broad general terms)
Dynamic Reconfiguration is done in coordination with Solaris, which recognises a hypervisor request to change its
virtual machine configuration and responds appropriately. In essence, Solaris receives a message saying
"you now have 16 more CPUs numbered 16 to 31" or "8GB more RAM starting at address X" or
"here's a new network or disk device - have fun with it". These actions take very little time.
Solaris then can start using the new resource. In the case of added CPUs, that means dispatching processes
and potentially binding interrupts to the new CPUs. For memory, Solaris adds the new memory pages to its
"free" list and starts using them. Comparable actions occur with network and disk devices: they are recognised
by Solaris and then used.
Removing is the reverse process: after receiving the DR message to free specific CPUs,
Solaris unbinds interrupts assigned to the CPUs and stops dispatching process threads.
That takes very little time.
primary # ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 4G 1.0% 6d 22h 29m
ldom1 active -n---- 5000 16 8G 0.9% 6h 59m
primary # ldm set-core 5 ldom1
primary # ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 4G 0.2% 6d 22h 29m
ldom1 active -n---- 5000 40 8G 0.1% 6h 59m
primary # ldm set-core 2 ldom1
primary # ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 4G 1.0% 6d 22h 29m
ldom1 active -n---- 5000 16 8G 0.9% 6h 59m
Memory pages are vacated by copying their contents to other memory locations and wiping them clean.
Solaris may have to swap memory contents to disk if the remaining RAM isn't enough to hold all the contents.
For this reason, deallocating memory can take longer on a loaded system. Even on a lightly loaded
system it took several 7 or 8 seconds to switch the domain below between 8GB and 24GB of RAM.
primary # ldm set-mem 24g ldom1
primary # ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 4G 0.1% 6d 22h 36m
ldom1 active -n---- 5000 16 24G 0.2% 7h 6m
primary # ldm set-mem 8g ldom1
primary # ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 16 4G 0.7% 6d 22h 37m
ldom1 active -n---- 5000 16 8G 0.3% 7h 7m
What if the device is in use?
(this is the anecdote that inspired this blog post)
If CPU or memory is being removed, releasing it pretty straightforward, using the method described above.
The resources are released, and Solaris continues with less capacity.
It's not as simple with a network or I/O device: you don't want to yank a device out from underneath an application
that might be using it. In the following example, I've added a virtual network device to ldom1 and want to take it away,
even though it's been plumbed.
primary # ldm rm-vnet vnet19 ldom1
Guest LDom returned the following reason for failing the operation:
Resource Information
---------------------------------------------------------- -----------------------
/devices/virtual-devices@100/channel-devices@200/network@1 Network interface net1
VIO operation failed because device is being used in LDom ldom1
Failed to remove VNET instance
That's what I call a helpful error message - telling me exactly what was wrong.
In this case the problem is easily solved. I know this NIC is seen in the guest as net1 so:
ldom1 # ifconfig net1 down unplumb
Now I can dispose of it, and even the virtual switch I had created for it:
primary # ldm rm-vnet vnet19 ldom1
primary # ldm rm-vsw primary-vsw9
If I had to take away the device disruptively, I could have used ldm rm-vnet -f
but that could disrupt whoever was using it. It's better if that can be avoided.
Summary
Oracle VM Server for SPARC provides dynamic reconfiguration, which lets you modify a guest domain's
CPU, memory and I/O configuration on the fly without reboot. You can add and remove resources as needed,
and even automate this for CPUs by setting up resource policies.
Taking things away can be more complicated than giving, especially for devices like disks and networks
that may contain application and system state or be involved in a transaction. LDoms and Solaris cooperative
work together to coordinate resource allocation and de-allocation in a safe and effective way.
For best practices, use dynamic reconfiguration to make the best use of your system's resources.