Virtualization Blog

Discussions and observations on virtualization.

CPU Feature Levelling in Dundee

As anyone with more than one type of server is no doubt aware, CPU feature levelling is the method by which we try to make it safe for VMs to migrate. If each host in a pool is identical, this is easy. However if the pool has non-identical hardware, we must hide the differences so that a VM which is migrated continues without crashing.

I don't think I am going to surprise anyone by admitting that the old way XenServer did feature levelling was clunky and awkward to use. Because of a step change introduced in Intel Ivy Bridge CPUs, feature levelling also ceased to work correctly. As a result, we took the time to redesign it, and the results are available for use in Dundee beta 2.

Background

When a VM boots, the kernel looks at the available featureset and in general, turn as much on as it knows how to. Linux will go so far as to binary patch some of the hotpaths for performance reasons. Userspace libraries frequently have the same algorithm compiled multiple times, and will select the best one to use at startup, based on which CPU instructions are available.

On a bare metal system, the featureset will not change while the OS is running, and the same expectations exist in the virtual world.  Migration introduces a problem with this expectation; it is literally like unplugging the hard drive and RAM from one computer, plugging it into another, and letting the OS continue from where it was. For the VM not to crash, all the features it is using must continue to work at the destination, or in other words, features which are in use must not disappear at any point.

Therefore, the principle of feature levelling is to calculate the common subset of features available across the pool, and restrict the VM to this featureset. That way the VMs featureset will always work, no matter which pool member it is currently running on.

Hardware

There are many factors affecting the available featureset for VMs to use:

  • The CPU itself
  • The BIOS/firmware settings
  • The hypervisor command line parameters
  • The restrictions which the toolstack chooses to apply

It is also worth identifying that firmware/software updates might reduce the featureset (e.g. Intel releasing a microcode update to disable TSX on Haswell/Broadwell processors).

Hiding features is also tricky; x86 provides no architectural means to do so. Feature levelling is therefore implemented using vendor-specific extensions, which are documented as unstable interfaces and liable to change at any point at any point inf the future (as happened with Ivy Bridge).

XenServer Pre Dundee

Older versions of XenServer would require a new pool member to be identical before it would be permitted to join. Making this happen involved consulting the xe host-cpu-info features, divining some command line parameters for Xen and rebooting.

If everything went to plan, the new slave would be permitted to join the pool. Once a pool had been created, it was assumed to be homogeneous from that point on. The command line parameters had an effect on the entire host, including Xen itself. In a slightly heterogeneous case, the difference in features tended to be features which only Xen would care to use, so was needlessly penalised along with dom0.

XenServer Dundee

In Dundee, we have made some changes:

There are two featuresets rather than one. PV and HVM guests are fundamentally different types of virtualisation, and come with different restrictions and abilities. HVM guests will necessarily have a larger potential featureset than PV, and having a single featureset which is safe for PV guests would apply unnecessary restrictions to HVM guests.

The featuresets are recalculated and updated every boot. Assuming that servers stay the same after initial configuration is unsafe, and incorrect.

So long as the CPU Vendor is the same (i.e. all Intel or all AMD), a pool join will be permitted to happen, irrespective of the available features. The pool featuresets are dynamically recalculated every time a slave joins or leaves a pool, and every time a pool member reconnects after reboot.

When a VM is started, it will be given the current pool featureset. This permits it to move anywhere in the pool, as the pool existed when it started. Changes in pool level have no effect on running VMs. Their featureset is fixed at boot (and is fixed across migrate, suspend/resume, etc.), which matches the expectation of the OS. (One release note is that to update the VM featureset, it must be shut down fully and restarted. This is contrary to what would be expected with a plain reboot, and exists because of some metatdata caching in the toolstack which is proving hard to untangle.)

Migration safety checks are performed between the VMs fixed featureset and the destination hosts featureset. This way, even if the pool level drops because a less capable slave joined, an already-running VM will still be able to move anywhere except the new, less capable slave.

The method of hiding features from VMs now has no effect on Xen and dom0. They never migrate, and will have access to all the available features (albeit with dom0 subject to being a VM in the first place).

Conclusion

We hope that the changes listed above will make Dundee far easier to use in a heterogeneous setup.

All of this information applies equally to inter-pool migration (Storage XenMotion) as intra-pool migration.  In the case of upgrade from older versions of XenServer (Rolling Pool Upgrade, Storage XenMotion again), there is a complication because of having to fill in some gaps in the older toolstacks idea of the VMs featureset.  In such a case, an incoming VM is assumed to have the featureset of the host it lands on, rather than the pool level.  This matches the older logic (and is the only safe course of action), but does result in the VM possibly having a higher featureset than the pool level, and being less mobile as a result.  Once the VM has been shut down and started back up again, it will be given the regular pool level and behave normally.

To summarise the two key points:

  • We expect feature levelling to work between any combination of CPUs from the same vendor (with the exception of PV guests on pre-Nehalem CPUs, which lacked any masking capabilities whatsoever).
  • We expect there to be no need for any manual feature configuration to join a pool together.

That being said, this is a beta release and I highly encourage you to try it out and comment/report back.

Recent Comments
Tobias Kreidl
This is good news; it's been very inconvenient not being able to mix servers with Haswell-based processors with other Intel chip s... Read More
Wednesday, 23 December 2015 16:22
Tobias Kreidl
I would add that a re-calculation of the feature set after each reboot is an excellent idea as I have seen the CPU mask change as ... Read More
Wednesday, 23 December 2015 16:35
Willem Boterenbrood
Good news indeed, I think it is important to make sure that features you include in XenServer (or any product) are going to keep w... Read More
Friday, 25 December 2015 01:20
Continue reading
16997 Hits
3 Comments

XenServer Driver Disks

Why do we need them

A matter of importance for any OS to consider, is how to support hardware. One part of that support is going to be the management of device drivers used to talk to network cards, storage controllers and other such devices.

For each XenServer release, Citrix works with hardware vendors to try and get the 'latest and greatest' (and obviously stable!) drivers for supporting upcoming hardware inbox - which is for the most part great. Customers installing XenServer can successfully install XenServer on their new hardware, and they don't need to think about drivers at all.

This is all good and well except for the fact that sometimes release schedules don't align so perfectly, and actually several months after a release an OEM may start shipping a new hardware that requires updated drivers. Or perhaps a customer discovers a issue with their system which is caused by a bug in the inbox device driver.

In either case, we need a mechanism to be able to provide the customer with a new driver: enter driver disks.

What are they

A 'Driver Disk' is a particular instance of a 'Supplemental Pack' which is the mechanism that XenServer uses for supporting the installation of RPMs in Dom0.

The format of a driver disk is simply a collection of RPMs, along with some meta-data regarding the packs contents and pack dependencies (obviously RPMs include their own dependency metadata too).

These Driver disks are built by hardware vendors using the 'Driver Development Kit' (DDK) which contains a Dom0 kernel for building drivers against.

The vendor simply:

  1. Creates a Makefile/specfile for generating a driver RPM.
  2. Uses build-supplemental-pack.sh to compile the driver RPM, and build an ISO with the appropriate metadata.

After which the vendor can use the ISO to install the new drivers on an appropriate XenServer version. Once hardware has been certified (using our self-certification kits) with the new drivers, the vendor can go ahead and submit the drivers to Citrix who will make them available for customers to download from support.citrix.com.

Respins

The astute among you, who are familiar with the Citrix support pages may have noticed that we don't just release a single driver disk for any given XenServer release, we actually release multiple.

The reason for this is that each driver is built against a specific kernel Application Binary Interface 'ABI' - which has a particular version number. When this ABI version is bumped, because of modifications to the kernel in a kernel hotfix, the new kernel will no longer load a driver module built against the old ABI.

This is why customers might encounter the following scenario:

  1. Customer installs XenServer 6.2 with driver foo x.y.z
  2. Customer installs a driver disk for foo x.y.z+1
  3. Customer installs a kernel hotfix
  4. Customer notices that XenServer is now using foo x.y.z (!)

In this scenario, because the customer has installed a kernel hotfix on 6.2, without installing a re-spun driver disk for foo x.y.z+1, the new kernel, shipped with the kernel hotfix, will load the driver included in the GA version of 6.2. This is in order to prevent customers from having to take driver updates with kernel hotfixes.

So for each kernel hotfix, Citrix will re-spin driver disks (previously released for that product version), and post them on support.citrix.com for customers to install. This gives customers maximum flexibility in being able to choose which kernel hotfixes/drivers they want to take, independently from one another.

More info

If you want to find out more about the packaging of driver disks, or in fact how to build supplemental packs - then please checkout the official documentation here:

 

http://docs.vmd.citrix.com/XenServer/6.2.0/1.0/en_gb/supplemental_pack_ddk.html     

Recent Comments
Robert
I'm wondering why there are not real word examples on how to build/create XenServer Driver Disks for a fresh installation. As exa... Read More
Tuesday, 30 September 2014 10:11
Continue reading
21042 Hits
2 Comments

About XenServer

XenServer is the leading open source virtualization platform, powered by the Xen Project hypervisor and the XAPI toolstack. It is used in the world's largest clouds and enterprises.
 
Commercial support for XenServer is available from Citrix.