At the 2012 Xen Project Developer Summit in San Diego I talked about the evolution of XenServer's architecture, specifically our forward looking R&D work looking at a set of architectural changes known as "Windsor". The architecture includes a number of foundational overhauls, such as moving to a 64 bit domain-0 with a PVops kernel and upgrading to the upstream version of qemu (XenServer currently uses a forked Xen Project version and therefore doesn't benefit from new features and improvements made in the more active upstream project). Those of you following the xenserver.org development snapshots will have seen a number of these key component overhauls already.
The more notable changes in the new architecture include various forms of improved modularity within the system including "domain-0 disaggregation" as well as improved intra-component modularity and better internal APIs.
We wanted to do this for various reasons including:
- To improve single-host scalability (e.g. the number of VMs and the amount of aggregate I/O the system can sustain) by parallelizing the handling of I/O over a number of driver domains
- To enable better multi-host scalability in scale-out cloud environments, primarily by allowing each host to run more independently and therefore reduce the bottleneck effect of the pool master
- To create the capability to have additional levels of tenant isolation by having per-tenant driver domains etc.
- To allow for possible future third party service VMs (driver domains etc.)
So where are we at with this? In the single-host scalability area, something that Citrix customers care a lot about, we had a parallel effort to try to improve scale and performance in the short term by scaling up domain-0 (i.e. adding more vCPUs and memory) and tactically removing bottlenecks. We actually did better that we expected with this so it's reduced the urgency to build the "scale-out" disaggregated solution. Some of this works is described in Jonathan Davies' blog posts: How did we increase VM density in XenServer 6.2? and How did we increase VM density in XenServer 6.2? (part 2)
XenServer today does have some (officially unsupported) mechanisms to run driver domains. These have been used within Citrix in a promising evaluation of the use of storage drivers domains for a physical appliance running the Citrix CloudBridge product, performing significant amounts of caching related I/O to a very large number of local SSDs spread across a number of RAID controllers. This is an area where the scale-out parallelism of Windsor is well suited.
On the multi-host scalability side we've made some changes to both XenServer and Apache CloudStack (the foundation of the Citrix CloudPlatform cloud orchestration product) to reduce the load on the pool master and therefore make it possible to use the maximum resource pool size. For the longer term we're evaluating the overlap between XenServer's pool-based clustering and the various forms of host aggregation offered by orchestration stacks such as CloudStack and OpenStack. With the orchestration stacks' ability to manage a large number of hosts do we really need to indirect all XenServer commands through a pool master?
Disaggregation has taken place in the Xen Project XAPI toolstack used in XenServer. A prerequisite to moving the xapi daemon into a service VM was to split the higher level clustering and policy part of the daemon from the low level VM lifecycle management and hypervisor interface. From XenServer 6.1 the latter function was split into a separate daemon called xenopsd with the original xapi daemon performing the clustering and policy functions. In the network management part of the stack a similar split has been made to separate the network control function into xcp-networkd - this created immediate value by having a better defined internal API but is also a prerequisite for network driver domains. The current development version of the XAPI project has had a number of other modularity clean-ups including various services being split into separate daemons with better build and packaging separation.
We're also using intra-component disaggregation for XenServer's virtual GPU (vGPU) support. A "discrete" emulator (DEMU) is used to provide the glue to allow the GPU vendor's control plane multiplexer driver in domain0 to service the control path parts of the vGPU access from the guest VM. This is done by, in effect, disaggregating qemu and having the DEMU take ownership of the I/O ports associated with the device it is emulating. This mechanism is now being added the the Xen Project to allow other virtual devices to be handled by discrete emulators, perhaps even in separate domains. Eventually we'd like to put the DEMUs and GPU driver into a driver domain to decouple the maintenance (particular required kernel version) of domain-0 and the GPU driver.
I view Windsor like a concept car, a way to try out new ideas and get feedback on their value and desirability. Like a concept car some of Windsor's ideas have made it into the shipping XenServer releases, some are coming, some are on the wishlist and some will never happen. Having a forward looking technology pipeline helps us to ensure that we keep evolving XenServer to meet users' needs both now and in the future.