WHY HA AND WHAT IT DOES
XenServer (XS) contains a native high-availability (HA) option which allows quite a bit of flexibility in determining the state of a pool of hosts and under what circumstances Virtual Machines (VMs) are to be restarted on alternative hosts in the event of the loss of the ability of a host to be able to serve VMs. HA is a very useful feature that protects VMs from staying failed in the event of a server crash or other incident that makes VMs inaccessible. Allowing a XS pool to help itself maintain the functionality of VMs is an important feature and one that plays a large role in sustaining as much uptime as possible. Permitting the servers to automatically deal with fail-overs makes system administration easier and allows for more rapid reaction times to incidents, leading to increased up-time for servers and the applications they run.
XS allows for the designation of three different treatments of Virtual Machines: (1) always restart, (2) restart if possible, and (3) do not restart. The VMs designated with the highest restart priority will be the first to be attempted to restart and all will be handled, provided adequate resources (primarily, host memory) are available. A specific start order, allowing for some VMs to be checked to be running before others, can also be established. VMs will be automatically distributed among whatever remaining XS hosts are considered active. Where necessary, note that hosts that contain expandable memory will be shrunk down to accommodate additional hosts and those hosts designated to be restarted will also be run with reduced memory, if necessary. If additional capacity exists to run more VMs, those designated as “start if possible” will be brought online. Whichever VMs that are not considered essential typically will be marked as “do not restart” and hence will be left “off” had they been running before, requiring any of those desired to be restarted to be done manually, resources permitting.
XS also allows for specifying the minimum number of active hosts to remain to accommodate failures; larger pools that are not overly populated with VMs can readily accommodate even two or more host failures.
The election of what hosts are “live” and should be considered active members of the pool follows a rather involved process of a combination of network accessibility plus access to an independent designated pooled Storage Repository (SR) that serves as an additional metric. The pooled SR can also be a fiber channel device, being independent of Ethernet connections. A quorum-based algorithm is applied to establish which servers are up and active as members of the pool and which -- in the event of a pool master failure -- should be elected the new pool master.
WHEN HA WORKS, IT WORKS GREAT
Without going into more detail, suffice it to say that this methodology works very well, however requiring a few prerequisite conditions that need to be taken into consideration. First of all, the mandate that a pooled storage device be available clearly means that a pool consisting of hosts that only make use of local storage will be precluded. Second, there is also a constraint that for a quorum to be possible, it is required to have a minimum of three hosts in the pool or HA results will be unpredictable as the election of a pool master can become ambiguous. This comes about because of the so-called “split brain” issue (http://linux-ha.org/wiki/Split_Brain) which is endemic in many different operating system environments that employ a quorum as means of making such a decision. Furthermore, while fencing (the process of isolating the host; see for example http://linux-ha.org/wiki/Fencing) is the typical recourse, the lack of intercommunication can result in a wrong decision being made and hence loss of access to VMs. Having experimented with two-host pools and the native XenServer HA, I would say that an estimate of it working about half the time is about right and from a statistical viewpoint, pretty much what you would expect.
This limitation is, however, still of immediate concern to those with either no pooled storage and/or only two hosts in a pool. With a little bit of extra network connectivity, a relatively simple and inexpensive solution to the external SR can be provided by making a very small NFS-based SR available. The second condition, however, is not readily rectified without the expense of at least one additional host and all the connectivity associated with it. In some cases, this may simply not be an affordable option.
For a number of years now, an alternative method of providing HA has been available through the program package provided by HA-Lizard (http://www.halizard.com/) , a community project that provides a free alternative that is neither dependent on external SRs nor requires a minimum of three hosts within a pool. In this blog, the focus will be on the standard HA-Lizard version and because of the particularly harder-to-handle situation of a two-node pool, it will also be the subject of discussion.
I had been experimenting for some time with HA-Lizard and found in particular that I was able to create failure scenarios that needed some improvement. HA-Lizard’s Salvatore Costantino was more than willing to lend an ear to the cases I had found and this led further to a very productive collaboration on investigating and implementing means to deal with a number of specific cases involving two-host pools. The result of these several months of efforts is a new HA-Lizard release that manages to address a number of additional scenarios above and beyond its earlier capabilities.
It is worthwhile mentioning that there are two ways of deploying HA-Lizard:
1) Most use cases combine HA-Lizard and iSCSI-HA which creates a two-node pool using local storage while maintaining full VM agility with VMs being able to run on either host. In this case, DRBD (http://www.drbd.org/) is implemented in this type of deployment and it works very well making use of the real-time storage replication.
2) HA-Lizard, only, is used with an external Storage Repository (as in this particular case).
Before going into details of the investigation, a few words should go towards a brief explanation of how this works. Note that there is only Internet connectivity (the use of a heuristic network node) and no external SR, so how is a split brain situation then avoidable?
This is how I'd describe the course of action in this two-node situation:
If a node sees the gateway, assume it's alive. If it cannot, assume it's a good candidate for fencing. If the node that cannot see the gateway is the master, it should internally kill any running VMs and surrender its ability to be the master and fence itself. The slave node should promote itself to master and attempt to restart any missing VMs. Any that are on the previous master will probably fail though, because there is no communication to the old master. If the old VMs cannot be restarted, eventually the new master will be able to restart them regardless after a toolstack restart. If the slave node fails by not being able to communicate with the network, as long as the master still sees the network and not the slave’s network, it can assume the slave needs to fence itself, kill off its VMs and assume that they will be restarted on the current master. The slave needs to realize it cannot communicate out, and therefore should kill off any of its VMs and fence itself.
Naturally, the trickier part comes with the timing of the various actions, since each node has to blindly assume the other is going to conduct a sequence of events. The key here is that these are all agreed on ahead of time and as long as each follows its own specific instructions, it should not matter that each of the two nodes cannot see the other node. In essence, the lack of communication in this case allows for creating a very specific course of action! If both nodes fail, obviously the case is hopeless, but that would be true of any HA configuration in which no node is left standing.
Various test plans were worked out for various cases and the table below elucidates the different test scenarios, what was expected and what was actually observed. It is very encouraging that the vast majority of these cases can now be properly handled.
Particularly tricky here was the case of rebooting the master server from the shell, without first disabling HA-Lizard (something one could readily forget to do). Since the fail-over process takes a while, a large number of VMs cannot be handled before the communication breakdown takes place, hence one is left with a bit of a mess to clean up in the end. Nevertheless, it’s still good to know what happens if something takes place that rightfully shouldn’t!
The other cases, whether intentional or not, are handled predictably and reliably, which is of course the intent. Typically, a two-node pool isn’t going to have a lot of complex VM dependencies, so the lack of a start order of VMs should not be perceived as a big shortcoming. Support for this feature may even be added in a future release.
HA-Lizard is a viable alternative to the native Citrix HA configuration. It’s straightforward to set up and can handle standard failover cases with a selective “restart/do not restart” setting for each VM or can be globally configured. There are a quite a number of configuration parameters which the reader is encouraged to research in the extensive HA-Lizard documentation. There is also an on-line forum which serves as a source for information and prompt assistance with issues. This most recent release 2.1.3 is supported on both XenServer 6.5 and 7.0.
Above all, HA-Lizard shines when it comes to handling a non-pooled storage environment and in particular, all configurations of the dreaded two-node pool configuration. From my direct experience, HA-Lizard now handles the vast majority of issues involved in a two-node pool and can do so more reliably than the non-supported two-node pool using Citrix’ own HA application. It has been possible to conduct a lot of tests with various cases and importantly, and to do so multiple times to ensure the actions are predictable and repeatable.
I would encourage taking a look at HA-Lizard and giving it a good test run. The software is free (contributions are accepted) and it is in extensive use and has a proven track record. For a two-host pool, I can frankly not think of a better alternative, especially with these latest improvements and enhancements.
I would also like to thank Salvatore Costantino for the opportunity to participate in this investigation and am very pleased to see the fruits of this collaboration. It has been one way of contributing to the Citrix XenServer user community that many can immediately benefit from.