Virtualization Blog

Discussions and observations on virtualization.

Whatever happened to XenServer's Windsor architecture?

b2ap3_thumbnail_Slide1.JPGAt the 2012 Xen Project Developer Summit in San Diego I talked about the evolution of XenServer's architecture, specifically our forward looking R&D work looking at a set of architectural changes known as "Windsor". The architecture includes a number of foundational overhauls, such as moving to a 64 bit domain-0 with a PVops kernel and upgrading to the upstream version of qemu (XenServer currently uses a forked Xen Project version and therefore doesn't benefit from new features and improvements made in the more active upstream project). Those of you following the xenserver.org development snapshots will have seen a number of these key component overhauls already.

The more notable changes in the new architecture include various forms of improved modularity within the system including "domain-0 disaggregation" as well as improved intra-component modularity and better internal APIs.

We wanted to do this for various reasons including:

  1. To improve single-host scalability (e.g. the number of VMs and the amount of aggregate I/O the system can sustain) by parallelizing the handling of I/O over a number of driver domains
  2. To enable better multi-host scalability in scale-out cloud environments, primarily by allowing each host to run more independently and therefore reduce the bottleneck effect of the pool master
  3. To create the capability to have additional levels of tenant isolation by having per-tenant driver domains etc.
  4. To allow for possible future third party service VMs (driver domains etc.)


So where are we at with this? In the single-host scalability area, something that Citrix customers care a lot about, we had a parallel effort to try to improve scale and performance in the short term by scaling up domain-0 (i.e. adding more vCPUs and memory) and tactically removing bottlenecks. We actually did better that we expected with this so it's reduced the urgency to build the "scale-out" disaggregated solution. Some of this works is described in Jonathan Davies' blog posts: How did we increase VM density in XenServer 6.2? and How did we increase VM density in XenServer 6.2? (part 2)

XenServer today does have some (officially unsupported) mechanisms to run driver domains. These have been used within Citrix in a promising evaluation of the use of storage drivers domains for a physical appliance running the Citrix CloudBridge product, performing significant amounts of caching related I/O to a very large number of local SSDs spread across a number of RAID controllers. This is an area where the scale-out parallelism of Windsor is well suited.

On the multi-host scalability side we've made some changes to both XenServer and Apache CloudStack (the foundation of the Citrix CloudPlatform cloud orchestration product) to reduce the load on the pool master and therefore make it possible to use the maximum resource pool size. For the longer term we're evaluating the overlap between XenServer's pool-based clustering and the various forms of host aggregation offered by orchestration stacks such as CloudStack and OpenStack. With the orchestration stacks' ability to manage a large number of hosts do we really need to indirect all XenServer commands through a pool master?

Disaggregation has taken place in the Xen Project XAPI toolstack used in XenServer. A prerequisite to moving the xapi daemon into a service VM was to split the higher level clustering and policy part of the daemon from the low level VM lifecycle management and hypervisor interface. From XenServer 6.1 the latter function was split into a separate daemon called xenopsd with the original xapi daemon performing the clustering and policy functions. In the network management part of the stack a similar split has been made to separate the network control function into xcp-networkd - this created immediate value by having a better defined internal API but is also a prerequisite for network driver domains. The current development version of the XAPI project has had a number of other modularity clean-ups including various services being split into separate daemons with better build and packaging separation.

b2ap3_thumbnail_demu.jpgWe're also using intra-component disaggregation for XenServer's virtual GPU (vGPU) support. A "discrete" emulator (DEMU) is used to provide the glue to allow the GPU vendor's control plane multiplexer driver in domain0 to service the control path parts of the vGPU access from the guest VM. This is done by, in effect, disaggregating qemu and having the DEMU take ownership of the I/O ports associated with the device it is emulating. This mechanism is now being added the the Xen Project to allow other virtual devices to be handled by discrete emulators, perhaps even in separate domains. Eventually we'd like to put the DEMUs and GPU driver into a driver domain to decouple the maintenance (particular required kernel version) of domain-0 and the GPU driver.

I view Windsor like a concept car, a way to try out new ideas and get feedback on their value and desirability. Like a concept car some of Windsor's ideas have made it into the shipping XenServer releases, some are coming, some are on the wishlist and some will never happen. Having a forward looking technology pipeline helps us to ensure that we keep evolving XenServer to meet users' needs both now and in the future.

Call for Participation for Xen User Summit
XenServer and the OpenSSL Heartbleed Vulnerability

Related Posts

 

Comments 7

Guest - james.cannon@citrix.com on Thursday, 01 May 2014 22:10

Any work being done by Citrix partners for guest domains to interact with SANs natively?

0
Any work being done by Citrix partners for guest domains to interact with SANs natively?
Tobias Kreidl on Saturday, 03 May 2014 15:42

@James C: You can do that with Linux VMs -- and have been able to for years, it's just not supported by Citrix. We have run some systems this way literlaly for years, in part because of th eimproved performance, and in one case, to surmount the seven storage devices limitation for a VM. As shown in this article http://discussions.citrix.com/topic/289937-some-benchmarks-for-a-xenserver-using-a-pooled-sr-and-assigned-vdi-vs-using-open-iscsi-for-a-direct-vm-connection-to-storage-on-an-md3600i-iscsi-device/ you can achieve up to about four times the performance. While Xenmotions still is possible, of course storage Xenmotion is not. There was a minor change needed in the RC order of events to make sure certain timeouts didn't take place plus the addition of the "_netdev" parameter to the mount point entry in fstab on the client (plus some optimization with various other parameters in fstab and some client kernel parameter optimizations, but nothing else really mandatory on the XenServer itself).

@James B: The point is that there is still a lot of overhead that makes the SR model slow compared to bare metal, VMware or Hupter-V storage implementations. With much faster storage options available these days, this seems like a critical bottleneck that needs to be seriously addressed. Even with SSD drives, Felipe Franciosi pointed out in articles such as http://www.xenserver.org/blog/blogger/listings/franciozzy.html that you hit a wall when too high of a device throughput is created such that XenServer just cannot keep up.

I also have long wondered why some better redundancy options like pNFS (parallel NFS) have not been incorporated, given the lack of much else for pooled thin-provisioned storage (and now with Inktank -- and with it, Ceph -- being absorbed by RedHat, who knows what's going to happen there?).

While I can understand the economics of building on the current model, I worry that parts of it are not scaling well, even if stuffing more VMs onto a server does. IMO, storage I/O is not only the weak link in the chain, but is likely one of, if not the most expensive component of a virtual environment these days.

-=Tobias

0
@James C: You can do that with Linux VMs -- and have been able to for years, it's just not supported by Citrix. We have run some systems this way literlaly for years, in part because of th eimproved performance, and in one case, to surmount the seven storage devices limitation for a VM. As shown in this article http://discussions.citrix.com/topic/289937-some-benchmarks-for-a-xenserver-using-a-pooled-sr-and-assigned-vdi-vs-using-open-iscsi-for-a-direct-vm-connection-to-storage-on-an-md3600i-iscsi-device/ you can achieve up to about four times the performance. While Xenmotions still is possible, of course storage Xenmotion is not. There was a minor change needed in the RC order of events to make sure certain timeouts didn't take place plus the addition of the "_netdev" parameter to the mount point entry in fstab on the client (plus some optimization with various other parameters in fstab and some client kernel parameter optimizations, but nothing else really mandatory on the XenServer itself). @James B: The point is that there is still a lot of overhead that makes the SR model slow compared to bare metal, VMware or Hupter-V storage implementations. With much faster storage options available these days, this seems like a critical bottleneck that needs to be seriously addressed. Even with SSD drives, Felipe Franciosi pointed out in articles such as http://www.xenserver.org/blog/blogger/listings/franciozzy.html that you hit a wall when too high of a device throughput is created such that XenServer just cannot keep up. I also have long wondered why some better redundancy options like pNFS (parallel NFS) have not been incorporated, given the lack of much else for pooled thin-provisioned storage (and now with Inktank -- and with it, Ceph -- being absorbed by RedHat, who knows what's going to happen there?). While I can understand the economics of building on the current model, I worry that parts of it are [u][b]not[/b][/u] scaling well, even if stuffing more VMs onto a server does. IMO, storage I/O is not only the weak link in the chain, but is likely one of, if not [b]the[/b] most expensive component of a virtual environment these days. -=Tobias
James Bulpin on Wednesday, 07 May 2014 16:47

We are looking at using the kernel mode block backend (blkback) for raw block devices, such as LVM (not LVHD) on SSDs and raw SAN LUNs. At the moment everything goes via blktap and into userspace tapdisk which can create additional latency and reduce throughput.

0
We are looking at using the kernel mode block backend (blkback) for raw block devices, such as LVM (not LVHD) on SSDs and raw SAN LUNs. At the moment everything goes via blktap and into userspace tapdisk which can create additional latency and reduce throughput.
Tobias Kreidl on Saturday, 03 May 2014 16:10

We have done this for years on Linux boxes with directly-attached iSCSI connections, with significant performance gains. I wrote an article on the Citrix forum about this, tested on an MD3600i storage device. Xenmotion even works (though of course, not storage Xenmotion) and only a couple of minor changes were needed o dom0. I have posted a longer response that evidently because of embedded links is being moderated. Direct Windows iSCSI connectivity is a whole different, more complex situation.

0
We have done this for years on Linux boxes with directly-attached iSCSI connections, with significant performance gains. I wrote an article on the Citrix forum about this, tested on an MD3600i storage device. Xenmotion even works (though of course, not storage Xenmotion) and only a couple of minor changes were needed o dom0. I have posted a longer response that evidently because of embedded links is being moderated. Direct Windows iSCSI connectivity is a whole different, more complex situation.
Guest - james on Saturday, 03 May 2014 17:26

James,
Thanks for the article. It is little sad to hear the Dom0 dis-aggregation slow down !
It bothers me to see XS loosing ground to VMware and MS, my 2c - XS needs razor focus on these areas - storage - vhdx, ceph support, robust snapshots; and overall perf and stability improvements - which are underway in the new planned activities

0
James, Thanks for the article. It is little sad to hear the Dom0 dis-aggregation slow down ! It bothers me to see XS loosing ground to VMware and MS, my 2c - XS needs razor focus on these areas - storage - vhdx, ceph support, robust snapshots; and overall perf and stability improvements - which are underway in the new planned activities
Tim Mackey on Monday, 05 May 2014 13:49

@james, I'd like to better understand what your requirements are on the storage front. I don't disagree that some of what you list aren't great ideas, but at the same time focus dictates we understand requirements. We have been putting energy into overall performance, and associated stability with XS 6.2. More work to be done there for sure, but we did make some pretty decent strides.

-tim

0
@james, I'd like to better understand what your requirements are on the storage front. I don't disagree that some of what you list aren't great ideas, but at the same time focus dictates we understand requirements. We have been putting energy into overall performance, and associated stability with XS 6.2. More work to be done there for sure, but we did make some pretty decent strides. -tim
Guest - james on Monday, 16 June 2014 17:24

@Tim -
Ceph block support - could be a low hanging fruit for XS which adds a huge value to any XS deployment, with RedHat taking over Ceph this is getting interesting.
VHDX support is on the cards but not a low hanging fruit.
What is worrying is the overall direction for XAPI - no doubt, this has been through some excellent work recently but confusing messaging and strategy is making the community adoption slow - rather loosing user base at Openstack
The dilemma over Xen+Libvirt vs XAPI in its current form needs to be resolved -
there is some valuable feedback about XAPI on the lists which seems to have caught no attention?
http://lists.xenproject.org/archives/html/xen-api/2014-06/msg00034.html

0
@Tim - Ceph block support - could be a low hanging fruit for XS which adds a huge value to any XS deployment, with RedHat taking over Ceph this is getting interesting. VHDX support is on the cards but not a low hanging fruit. What is worrying is the overall direction for XAPI - no doubt, this has been through some excellent work recently but confusing messaging and strategy is making the community adoption slow - rather loosing user base at Openstack The dilemma over Xen+Libvirt vs XAPI in its current form needs to be resolved - there is some valuable feedback about XAPI on the lists which seems to have caught no attention? http://lists.xenproject.org/archives/html/xen-api/2014-06/msg00034.html

About XenServer

XenServer is the leading open source virtualization platform, powered by the Xen Project hypervisor and the XAPI toolstack. It is used in the world's largest clouds and enterprises.
 
Technical support for XenServer is available from Citrix.