Hyper-V Live Migration across Sites!
- by Ryan Roussel
One of the great sessions I sat in on at Tech Ed this week was stretching a Windows 2008 R2 Hyper-V Failover Cluster across sites. With this ability, you could actually implement a Hyper-V cluster where you could migrate or even Live Migrate VMs across sites. With this area’s propensity for Hurricanes, this will be a very popular topic for me over the next few months. While this technology is possible today, it’s also very complicated and can be very expensive to implement. First your WAN connection has to support the ability to trunk your VLAN across both sites in order to Live Migrate. This means you can’t use a Layer 3 routed connection like MPLS. It has to be a Metro Ethernet connection or "Dark Fiber”. Dark Fiber is unused Fiber already in the ground that can be leased from various providers. Both of these connections would allow you to trunk layer 2 across your WAN. Cisco does have the ability to trunk layer 2 across a routed connection by muxing the traffic but this is only available in their Nexus product line which has a very steep price tag. If you are stuck with MPLS or the like and Nexus switching is not a realistic possibility, you will have to implement a multi-subnet cluster in which case Live Migration won’t be possible. However you can still failover VMs to the remote site with some planning and manual intervention. The consideration here is that the VMs will be on a different subnet once migrated, so you will have to change the IP addressing of your VMs. This also has ramifications with DNS and Name resolution to control your down time. DHCP with Reservations for your VMs is the preferred method to achieve the IP changes as this will automate that part of the process. Secondly, you will have to have a mechanism to replicate your storage across both sites. Many SAN vendors natively support hardware based synchronous and asynchronous replication. Some even support cluster shared volumes which were introduced in 2008 R2. If your SANs do not support this natively, there are alternative file based replication products either software based like Double Take or hardware appliance like EMC. Be sure to check with your vendor on the support of Disk majority if you’re replicating your quorum disk between SANs. The last consideration is the ability to maintain quorum for your cluster. If your replication provider does not support Disk Majority through replication, you will have to explore Node Majority with File Share Witness. This will affect your design as a 3 node cluster with 1 node at the remote site and FSW at the production site would not have the ability to maintain quorum if the production site was lost. MS best practice for this would be to implement an even node cluster with 2 nodes at each site and the FSW at a third site. And there you have it. While some considerations and research goes into implementing this solution, even a multi-subnet solution would be invaluable to organizations in the implementations of “warm” DR sites.