The XenServer team has made a number of significant performance and scalability improvements in the XenServer 7.0 release. This is the fourth in a series of articles that will describe the principal improvements. For the previous ones, see:
In this article we return to the theme of I/O throughput. Specifically, we focus on improvements to the total throughput achieved by a number of VMs performing I/O concurrently. Measurements show that XenServer 7.0 enjoys aggregate network throughput over three times faster than XenServer 6.5, and also has an improvement to aggregate storage throughput.
What limits aggregate I/O throughput?
When a number of VMs are performing I/O concurrently, the total throughput that can be achieved is often limited by dom0 becoming fully busy, meaning it cannot do any additional work per unit time. The I/O backends (netback for network I/O and tapdisk3 for storage I/O) together consume 100% of available dom0 CPU time.
How can this limit be overcome?
Whenever there is a CPU bottleneck like this, there are two possible approaches to improving the performance:
- Reduce the amount of CPU time required to perform I/O.
- Increase the processing capacity of dom0, by giving it more vCPUs.
Surely approach 2 is easy and will give a quick win...? Intuitively, we might expect the total throughput to increase proportionally with the number of dom0 vCPUs.
Unfortunately it's not as straightforward as that. The following graph shows what happened to the aggregate network throughput on XenServer 6.5 if the number of dom0 vCPUs is artificially increased. (In this case, we are measuring the total network throughput of 40 VMs communicating amongst themselves on a single Dell R730 host.)
Counter-intuitively, the aggregate throughput decreases as we add more processing power to dom0! (This explains why the default was at most 8 vCPUs in XenServer 6.5.)
So is there no hope for giving dom0 more processing power...?
The explanation for the degradation in performance is that certain operations run more slowly when there are more vCPUs present. In order to make dom0 work better with more vCPUs, we needed to understand what those operations are, and whether they can be made to scale better.
Three such areas of poor scalability were discovered deep in the innards of Xen by Malcolm Crossley and David Vrabel, and improvements were made for each:
- Maptrack lock contention – improved by http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=dff515dfeac4c1c13422a128c558ac21ddc6c8db
- Grant-table lock contention – improved by http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=b4650e9a96d78b87ccf7deb4f74733ccfcc64db5
- TLB flush on grant-unmap – improved by https://github.com/xenserver/xen-4.6.pg/blob/master/master/avoid-gnt-unmap-tlb-flush-if-not-accessed.patch
The result of improving these areas is dramatic – see the green line in the following graph:
Now, throughput scales very well as the number of vCPUs increases. This means that, for the first time, it is now beneficial to allocate many vCPUs to dom0 – so that when there is demand, dom0 can deliver. Hence we have given XenServer 7.0 a higher default number of dom0 vCPUs.
How many vCPUs are now allocated to dom0 by default?
Most hosts will now get 16 vCPUs by default, but the exact number depends on the number of CPU cores on the host. The following graph summarises how the default number of dom0 vCPUs is calculated from the number of CPU cores on various current and historic XenServer releases:
Summary of improvements
I will conclude with some aggregate I/O measurements comparing XenServer 6.5 and 7.0 under default settings (no dom0 configuration changes) on a Dell R730xd.
- Aggregate network throughput – twenty pairs of 32-bit Debian 6.0 VMs sending and receiving traffic generated with iperf 2.0.5.
- Aggregate storage IOPS – twenty 32-bit Windows 7 SP1 VMs each doing single-threaded, serial, sequential 4KB reads with fio to a virtual disk on an Intel P3700 NVMe drive.
The XenServer team has made a number of significant performance and scalability improvements in the XenServer 7.0 release. This is the third in a series of articles that will describe the principal improvements. For the first two, see here:
The topic of this post is control plane performance. XenServer 7.0 achieves significant performance improvements through the support for parallel VBD operations in xenopsd. With the improvements, xenopsd is able to plug and unplug many VBDs (virtual block devices) at the same time, substantially improving the duration of VM lifecycle operations (start, migrate, shutdown) for VMs with many VBDs, and making it practical to operate VMs with up to 255 VBDs.
Background of the VM lifecycle operations
In XenServer, xenopsd is the dom0 component responsible for VM lifecycle operations:
- during a VM start, xenopsd creates the VM container and then plugs the VBDs before starting the VCPUs;
- during a VM shutdown, xenopsd stops the VCPUs and then unplugs the VBDs before destroying the VM container;
- during a VM migrate, xenopsd creates a new VM container, unplugs the VBDs of the old VM container, and plugs the VBDs for the new VM before starting its VCPUs; while the VBDs are being unplugged and plugged on the other VM container, the user experiences a VM downtime when the VM is unresponsive because both old and new VM containers are paused.
Measurements have shown that a large part, usually most of the duration of these VM lifecycle operations is due to plugging and unplugging the VBDs, especially on slow or contended storage backends.
Why does xenopsd take some time to plug and unplug the VBDs?
The completion of a xenopsd VBD plug operation involves the execution of two storage layer operations, VDI attach and VDI activate (where VDI stands for virtual disk image). These VDI operations include control plane manipulation of daemons, block devices and disk metadata in dom0, which will take different amounts of time to execute depending on the type of the underlying Storage Repository (SRs, such as LVM, NFS or iSCSI) used to hold the VDIs, and the current load on the storage backend disks and their types (SSDs or HDs). Similarly, the completion of a xenopsd VBD unplug operation involves the execution of two storage layer operations, VDI deactivate and VDI detach, with the corresponding overhead of manipulating the control plane of the storage layer.
If the underlying physical disks are under high load, there may be contention preventing progress of the storage layer operations, and therefore xenopsd may need to wait many seconds before the requests to plug and unplug the VBDs can be served.
Originally, xenopsd would execute these VBD operations sequentially, and the total time to finish all of them for a single VM would depend of the number of VBDs in the VM. Essentially, it would be a sum of the time to operate each of othe VBDs of this VM, which would result in several minutes of wait for a lifecycle operation of a VM that had, for instance, 255 VBDs.
What are the advantages of parallel VBD operations?
Plugging and unplugging the VBDs in parallel in xenopsd:
- provides a total duration for the VM lifecycle operations that is independent of the number of VBDs in the VM. This duration will typically be the duration of the longest individual VBD operation amongst the parallel VBD operations for that VM;
- provides a significant instantaneous improvement for the user, across all the VBD operations involving more than 1 VBD per VM. The more devices involved, the larger the noticeable improvement, up to the saturation of the underlying storage layer;
- this single improvement is immediately applicable across all of VM start, VM shutdown and VM migrate lifecycle operations.
Are there any disadvantages or limitations?
Plugging and unplugging VBDs uses dom0 memory. The main disadvantage of doing these in parallel is that dom0 needs more memory to handle all the parallel operations. To prevent situations where a large number of such operations would cause dom0 to run out of memory, we have added two limits:
- the maximum number of global parallel operations that xenopsd can request is the same as the number of xenopsd worker-pool threads as defined by worker-pool-size in /etc/xenopsd.conf. This prevents regression in the maximum dom0 memory usage compared to when sequential VBD operations per VM was used in xenopsd. An increase in this value will increase the number of parallel VBD operations, at the expense of having to increase the dom0 memory for about 15MB for each extra parallel VBD operation.
- the maximum number of per-VM parallel operations that xenopsd can request is currently fixed to 10, which covers a wide range of VMs and still provides a 10x improvement in lifecycle operation times for those VMs that have more than 10 VBDs.
Where do I find the changes?
The changes that implemented this feature are available in github at https://github.com/xapi-project/xenopsd/pull/250
What sort of theoretical improvements should I expect in XenServer 7.0?
The exact numbers depend on the SR type, storage backend load characteristics and the limits specified in the previous section. Given the limits in the previous section, the results for the duration of VDB plugs for a single VM will follow the pattern in the following table:
Number n of VBDs/VM
Improvement of VBD operations
|<=10 VBDs/VM||n times faster|
|> 10 VBDs/VM||
10 times faster
The table above assumes that the maximum number of global parallel operations discussed in the section above is not reached. If you want to guarantee the improvement in the table above for x>1 simultaneous VM lifecycle operations, at the expense of using more dom0 memory in the worst case, you will probably want to set worker-pool-size = (n * x) in /etc/xenopsd.conf, where n is a number reflecting the average number of VBDs/VM amongst all VMs up to a maximum of n=10.
What sort of practical improvements should I expect in XenServer 7.0?
The VBD plug and unplug operations are only part of the overall operations necessary to execute a VM lifecycle operation. The remaining parts, such as creation of the VM container and VIF plugs, will disperse the VBD improvements of the previous section, though they are still significant. Some examples of improvements, using a EXT SR on a local SSD storage backend:
VM lifecycle operation
mImprovement with 8 VBDs/VM
Toolstack time to start a single VM
|Toolstack time to bootstorm 125 VMs||
The approximately 2s improvement in single VM start time was caused by plugging the 8 VBDs in parallel. As we see in the second row of the table, this can be a significant advantage in a bootstorm.
In XenServer 7.0, not only does xenopsd execute VBD operations in parallel, but it also has improvements in the storage layer operation times on VDIs, so you may observe that in your XenServer 7.0 environment further VM lifecycle time improvements beyond the expected ones from parallel VBD operations are noticeable, compared to XenServer 6.5SP1.
The XenServer team has made a number of significant performance and scalability improvements in the XenServer 7.0 release. This is the second in a series of articles that will describe the principal improvements. For the first, see http://xenserver.org/blog/entry/dundee-tapdisk3-polling.html.
The topic of this post is network I/O performance. XenServer 7.0 achieves significant performance improvements through the support for multi-queue paravirtualised network interfaces. Measurements of one particular use-case show an improvement from 17 Gb/s to 41 Gb/s.
A bit of background about the PV network datapath
To the guest OS, the netfront driver feels just like a physical network device. When a guest wants to transmit data:
- Netfront puts references to the page(s) containing that data into a "Transmit" ring buffer it shares with dom0.
- Netback in dom0 picks up these references and maps the actual data from the guest's memory so it appears in dom0's address space.
- Netback then hands the packet to the dom0 kernel, which uses normal routing rules to determine that it should go to an Open vSwitch device and then on to either a physical interface or the netback device for another guest on the same host.
When dom0 has a network packet it needs to send to the guest, the reverse procedure applies, using a separate "Receive" ring.
Amongst the factors that can limit network throughput are:
- the ring becoming full, causing netfront to have to wait before more data can be sent, and
- the netback process fully consuming an entire dom0 vCPU, meaning it cannot go any faster.
Multi-queue alleviates both of these potential bottlenecks.
What is multi-queue?
Rather than having a single Transmit and Receive ring per virtual interface (VIF), multi-queue means having multiple Transmit and Receive rings per VIF, and one netback thread for each:
Now, each TCP stream has the opportunity to be driven through a different Transmit or Receive ring. The particular ring chosen for each stream is determined by a hash of the TCP header (MAC, IP and port number of both the source and destination).
Crucially, this means that separate netback threads can work on each TCP stream in parallel. So where we were previously limited by the capacity of a single dom0 vCPU to process packets, now we can exploit several dom0 vCPUs. And where the capacity of a single Transmit ring limited the total amount of data in-flight, the system can now support a larger amount.
Which use-cases can take advantage of multi-queue?
Anything involving multiple TCP streams. For example, any kind of server VM that handles connections from more than one client at the same time.
Which guests can use multi-queue?
Since frontend changes are needed, the version of the guest's netfront driver matters. Although dom0 is geared up to support multi-queue, guests with old versions of netfront that lack multi-queue support are limited to single Transmit and Receive rings.
- For Windows, the XenServer 7.0 xennet PV driver supports multi-queue.
- For Linux, multi-queue support was added in Linux 3.16. This means that Debian Jessie 8.0 and Ubuntu 14.10 (or later) support multi-queue with their stock kernels. Over time, more and more distributions will pick up the relevant netfront changes.
How does the throughput scale with an increasing number of rings?
The following graph shows some measurements I made using iperf 2.0.5 between a pair of Debian 8.0 VMs both on a Dell R730xd host. The VMs each had 8 vCPUs, and iperf employed 8 threads each generating a separate TCP stream. The graph reports the sum of the 8 threads' throughputs, varying the number of queues configured on the guests' VIFs.
We can make several observations from this graph:
- The throughput scales well up to four queues, with four queues achieving more than double the throughput possible with a single queue.
- The blip at five queues probably arose when the hashing algorithm failed to spread the eight TCP streams evenly across the queues, and is thus a measurement artefact. With different TCP port numbers, this may not have happened.
- While the throughput generally increases with an increasing number of queues, the throughput is not proportional to the number of rings. Ideally, the throughput would double when you double the number of rings. This doesn't happen in practice because the processing is not perfectly parallelisable: netfront needs to demultiplex the streams onto the rings, and there are some overheads due to locking and synchronisation between queues.
This graph also highlights the substantial improvement over XenServer 6.5, in which only one queue per VIF was supported. In this use-case of eight TCP streams, XenServer 7.0 achieves 41 Gb/s out-of-the-box where XenServer 6.5 could manage only 17 Gb/s – an improvement of 140%.
How many rings do I get by default?
By default the number of queues is limited by (a) the number of vCPUs the guest has and (b) the number of vCPUs dom0 has. A guest with four vCPUs will get four queues per VIF.
This is a sensible default, but if you want to manually override it, you can do so in the guest. In a Linux guest, add the parameter
xen_netfront.max_queues=n, for some n, to the kernel command-line.
The XenServer team made a number of significant performance and scalability improvements in the XenServer 7.0 release. This is the first in a series of articles that will describe the principal improvements.
Our first topic is storage I/O performance. A performance improvement has been achieved through the adoption of a polling technique in tapdisk3, the component of XenServer responsible for handling I/O on virtual storage devices. Measurements of one particular use-case demonstrate a 50% increase in performance from 15,000 IOPS to 22,500 IOPS.
What is polling?
Normally, tapdisk3 operates in an event-driven manner. Here is a summary of the first few steps required when a VM wants to do some storage I/O:
- The VM's paravirtualised storage driver (called blkfront in Linux or xenvbd in Windows) puts a request in the ring it shares with dom0.
- It sends tapdisk3 a notification via an event-channel.
- This notification is delivered to domain 0 by Xen as an interrupt. If Domain 0 is not running, it will need to be scheduled in order to receive the interrupt.
- When it receives the interrupt, the domain 0 kernel schedules the corresponding backend process to run, tapdisk3.
- When tapdisk3 runs, it looks at the contents of the shared-memory ring.
- Finally, tapdisk3 finds the request which can then be transformed into a physical I/O request.
Polling is an alternative to this approach in which tapdisk3 repeatedly looks in the ring, speculatively checking for new requests. This means that steps 2–4 can be skipped: there's no need to wait for an event-channel interrupt, nor to wait for the tapdisk3 process to be scheduled: it's already running. This enables tapdisk3 to pick up the request much more promptly as it avoids these delays inherent to the event-driven approach.
The following diagram contrasts the timelines of these alternative approaches, showing how polling reduces the time until the request is picked up by the backend.
How does polling help improve storage I/O performance?
Polling is in established technique for reducing latency in event-driven systems. (One example of where it is used elsewhere to mitigate interrupt latency is in Linux networking drivers that use NAPI.)
Servicing I/O requests promptly is an essential part of optimising I/O performance. As I discussed in my talk at the 2015 Xen Project Developer Summit, reducing latency is the key to maintaining a low virtualisation overhead. As physical I/O devices get faster and faster, any latency incurred in the virtualisation layer becomes increasingly noticeable and translates into lower throughputs.
An I/O request from a VM has a long journey to physical storage and back again. Polling in tapdisk3 optimises one section of that journey.
Isn't polling really CPU intensive, and thus harmful?
Yes it is, so we need to handle it carefully. If left unchecked, polling could easily eat huge quantities of domain 0 CPU time, starving other processes and causing overall system performance to drop.
We have chosen to do two things to avoid consuming too much CPU time:
- Poll the ring only when there's a good chance of a request appearing. Of course, guest behaviour is totally unpredictable in general, but there are some principles that can increase our chances of polling at the right time. For example, one assumption we adopt is that it's worth polling for a short time after the guest issues an I/O request. It has issued one request, so there's a good chance that it will issue another soon after. And if this guess turns out to be correct, keep on polling for a bit longer in case any more turn up. If there are none for a while, stop polling and temporarily fall back to the event-based approach.
- Don't poll if domain 0 is already very busy. Since polling is expensive in terms of CPU cycles, we only enter the polling loop if we are sure that it won't starve other processes of CPU time they may need.
How much faster does it go?
The benefit you will get from polling depends primarily on the latency of your physical storage device. If you are using an old mechanical hard-drive or an NFS share on a server on the other side of the planet, shaving a few microseconds off the journey through the virtualisation layer isn't going to make much of a difference. But on modern devices and low-latency network-based storage, polling can make a sizeable difference. This is especially true for smaller request sizes since these are most latency-sensitive.
For example, the following graph shows an improvement of 50% in single-threaded sequential read I/O for small request sizes – from 15,000 IOPS to 22,500 IOPS. These measurements were made with iometer in a 32-bit Windows 7 SP1 VM on a Dell PowerEdge R730xd with an Intel P3700 NVMe drive.
How was polling implemented?
The code to add polling to tapdisk3 can be found in the following set of commits: https://github.com/xapi-project/blktap/pull/179/commits.
With the exciting release of the latest XenServer Dundee beta, the immediate reaction is to download it to give it a whirl to see all the shiny new features (and maybe to find out if your favourite bug has been fixed!). Unfortunately, it's not something that can just be installed, tested and uninstalled like a normal application - you'll need to find yourself a server somewhere you're willing to sacrifice in order to try it out. Unless, of course, you decide to use the power of virtualisation!
XenServer as a VM
Nested virtualisation - running a VM inside another VM - is not something that anyone recommends for production use, or even something that works at all in some cases. However, since Xen has its origins way back before hardware virtualisation became ubiquitous in intel processors, running full PV guests (that don't require any HW extensions) when XenServer is running as a VM actually works very well indeed. So for the purposes of evaluating a new release of XenServer it's actually a really good solution. It's also ideal for trying out many of the Unikernel implementations, such as Mirage or Rump kernels as these are pure PV guests too.
XenServer works very nicely when run on another XenServer, and indeed this is what we use extensively to develop and test our own software. But once again, not everyone has spare capacity to do this. So let's look to some other virtualisation solutions that aren't quite so server focused and that you might well have installed on your own laptop. Enter Oracle's VirtualBox.
VirtualBox, while not as performant a virtualization solution as Xen, is a very capable platform that runs XenServer without any problems. It also has the advantage of being easily installable on your own desktop or laptop. Therefore it's an ideal way to try out these betas of XenServer in a quick and convenient way. It also has some very convenient tools that have been built around it, one of which is Vagrant.
Vagrant is a tool for provisioning and managing virtual machines. It targets several virtualization platforms including VirtualBox, which is what we'll use now to install our XenServer VM. The model is that it takes a pre-installed VM image - what Vagrant calls a 'box' - and some provisioning scripts (using scripts, Salt, Chef, Ansible or others), and sets up the VM in a reproducible way. One of its key benefits is it simplifies management of these boxes, and Hashicorp run a service called Atlas that will host your boxes and metadata associated with them. We have used this service to publish a Vagrant box for the Dundee Beta.
Try the Dundee Beta
Once you have Vagrant installed, trying the Dundee beta is as simple as:
vagrant init xenserver/dundee-beta
This will download the box image (about 1 Gig) and create a new VM from this box image. As it's booting it will ask which network to bridge onto, which if you want your nested VMs to be available on the network should be a wired network rather than wireless.
The XenServer image is tweaked a little bit to make it easier to access - for example, it will by default DHCP all of the interfaces, which is useful for testing XenServer, but wouldn't be advisable for a real deployment. To connect to your XenServer, we need to find the IP address, so the simplest way of doing this is to ssh in and ask:
Mac-mini:xenserver jon$ vagrant ssh -c "sudo xe pif-list params=IP,device"
device ( RO) : eth1 IP ( RO): 192.168.1.102 device ( RO) : eth2 IP ( RO): 172.28.128.5 device ( RO) : eth0 IP ( RO): 10.0.2.15
So you should be able to connect using one of those IPs via XenCenter or via a browser to download XenCenter (or via any other interface to XenServer).
Let's now go all Inception and install ourselves a VM within our XenServer VM. Let's assume, for the sake of argument, and because as I'm writing this it's quite true, that we're not running on a Windows machine, nor do we have one handy to run XenCenter on. We'll therefore restrict ourselves to using the CLI.
As mentioned before, HVM VMs are out so we're limited to pure PV guests. Debian Wheezy is a good example of one of these. First, we need to ssh in and become root:
Mac-mini:xenserver jon$ vagrant ssh Last login: Thu Mar 31 00:10:29 2016 from 10.0.2.2 [vagrant@localhost ~]$ sudo bash [root@localhost vagrant]#
Now we need to find the right template:
[root@localhost vagrant]# xe template-list name-label="Debian Wheezy 7.0 (64-bit)"
uuid ( RO) : 429c75ea-a183-a0c0-fc70-810f28b05b5a
name-label ( RW): Debian Wheezy 7.0 (64-bit)
name-description ( RW): Template that allows VM installation from Xen-aware Debian-based distros. To use this template from the CLI, install your VM using vm-install, then set other-config-install-repository to the path to your network repository, e.g. http:///
Now, as the description says, we use 'vm-install' and set the mirror:
[root@localhost vagrant]# xe vm-install template-uuid=429c75ea-a183-a0c0-fc70-810f28b05b5a new-name-label=wheezy
[root@localhost vagrant]# xe vm-param-set uuid=479f228b-c502-a791-85f2-a89a9f58e17f other-config:install-repository=http://ftp.uk.debian.org/debian
The VM doesn't have any network connection yet, so we'll need to add a VIF. We saw the IP addresses of the network interfaces above, and in my case eth1 corresponds to the bridged network I selected when starting the XenServer VM using Vagrant. So I need the uuid of the network, so I'll list the networks:
[root@localhost vagrant]# xe network-list uuid ( RO) : c7ba748c-298b-20dc-6922-62e6a6645648 name-label ( RW): Pool-wide network associated with eth2 name-description ( RW): bridge ( RO): xenbr2 uuid ( RO) : f260c169-20c3-2e20-d70c-40991d57e9fb name-label ( RW): Pool-wide network associated with eth1 name-description ( RW): bridge ( RO): xenbr1 uuid ( RO) : 8d57e2f3-08aa-408f-caf4-699b18a15532 name-label ( RW): Host internal management network name-description ( RW): Network on which guests will be assigned a private link-local IP address which can be used to talk XenAPI bridge ( RO): xenapi uuid ( RO) : 681a1dc8-f726-258a-eb42-e1728c44df30 name-label ( RW): Pool-wide network associated with eth0 name-description ( RW): bridge ( RO): xenbr0
So I need a VIF on the network with uuid f260c...
[root@localhost vagrant]# xe vif-create vm-uuid=479f228b-c502-a791-85f2-a89a9f58e17f network-uuid=681a1dc8-f726-258a-eb42-e1728c44df30 device=0 e96b794e-fef3-5c2b-8803-2860d8c2c858
All set! Let's start the VM and connect to the console:
[root@localhost vagrant]# xe vm-start uuid=479f228b-c502-a791-85f2-a89a9f58e17f [root@localhost vagrant]# xe console uuid=479f228b-c502-a791-85f2-a89a9f58e17f
This should drop us into the Debian installer:
A few keystrokes later and we've got ourselves a nice new VM all set up and ready to go.
All of the usual operations will work; start, shutdown, reboot, suspend, checkpoint and even, if you want to set up two XenServer VMs, migration and storage migration. You can experiment with bonding, try multipathed ISCSI, check that alerts are generated, and almost anything else (with the exception of HVM and anything hardware specific such as VGPUs, of course!). It's also an ideal companion to the Docker build environment I blogged about previously, as any new things you might be experimenting with can be easily built using Docker and tested using Vagrant. If anything goes wrong, a 'vagrant destroy' followed by a 'vagrant up' and you've got a completely fresh XenServer install to try again in less than a minute.
The Vagrant box is itself created using Packer, a tool often used to create Vagrant boxes. The configuration for this is available on github, and feedback on this box is very welcome!
In a future blog post, I'll be discussing how to use Vagrant to manage XenServer VMs.
I am pleased to announce that Dundee beta 3 has been released, and for those of you who monitor Citrix Tech Previews, beta 3 corresponds to the core XenServer platform used for Dundee TP3. This third beta marks a major development milestone representing the proverbial "feature complete" stage. Normally when announcing pre-release builds, I highlight major functional advances but this time I need to start with a feature which was removed.
Thin Provisioned Block Storage Removed
While its never great to start with a negative, I felt anything related to removal of a storage option takes priority over new and shiny. I'm going to keep this section short, and also highlight that only the new thin provisioned block feature was removed and existing thin provisioned NFS and file based storage repositories will function as they've always done.
What should I do before upgrading to beta 3?
While we don't actively encourage upgrades to pre-release software, we do recognize you're likely to do it at least once. If you have built out infrastructure using thin provisioned iSCSI or HBA storage using a previous pre-release of Dundee, please ensure you migrate any critical VMs to either local storage, NFS or thick provisioned block storage prior to performing an upgrade to beta 3.
So what happened?
As is occasionally the case with pre-release software not all features which are previewed will make it to the final release; for any of a variety of reasons. That is of course one reason we provide pre-release access. In the case of the thin provisioned block storage implementation present in earlier Dundee betas, we ultimately found that it had issues under performance stress. As a result, we've made the difficult decision to remove it from Dundee at this time. Investigation into alternative implementations are underway, and the team is preparing a more detailed blog on future directions.
Beta 3 Overview
Much of difference between beta 2 and beta 3 can be found in some of the details. dom0 has been updated to a CentOS 7.2 userspace, the Xen project hypervisor is now 4.6.1 and the kernel is 3.10.96. Support for xsave and xrestor floating point instructions has been added, enabling guest VMs to utilize AVX instructions available on newer Intel processors. We've also added experimental support for the Microsoft Windows Server 2016 Tech Preview and the Ubuntu 16.04 beta.
Beta 3 Bug Fixes
Earlier pre-releases of Dundee had an issue wherein performing a storage migration of a VM with snapshots and in particular orphaned snapshots would result in migration errors. Work has been done to resolve this, and it would be beneficial for anyone taking beta 3 to exercise storage motion to validate if the fix is complete.
One of the focus areas for Dundee is to improve scalability, and as part of that we've uncovered some situations where overall reliability wasn't what we wanted. An example of such a situation, which we've resolved, occurs when a VM with a very large number of VBDs is running on a host, and a XenServer admin requests the host to shutdown. Prior to the fix, such a host would become unresponsive.
The default logrotate for xensource.log has been changed to rotate at a 100MB in addition to daily. This change was done as on very active systems excessive disk consumption could result in the prior configuration.
Building bits of XenServer outside of Citrix has in the past been a bit of a challenging task, requiring careful construction of the build environment to replicate what 'XenBuilder', our internal build system, puts together. This has meant using custom DDK VMs or carefully installing by hand a set of packages taken from one of the XenServer ISOs. With XenServer Dundee, this will be a pain of the past, and making a build environment will be just a 'docker run' away.
Part of the work that's being done for XenServer Dundee has been moving things over to using standard build tools and packaging. In previous releases there have been a mix of RPMs, tarballs and patches for existing files, but for the Dundee project everything installed into dom0 is now packaged into an RPM. Taking inspiration and knowledge gained while working on xenserver/buildroot, we're building most of these dom0 packages now using mock. Mock is a standard tool for building RPM packages from source RPMs (SRPMS), and it works by constructing a completely clean chroot with only the dependencies defined by the SRPM. This means that everything needed to build a package must be in an RPM, and the dependencies defined by the SRPM must be correct too.
From the point of view of making reliably reproducible builds, using mock means there is very little possibility of having the build dependent upon the the environment. But there is also a side benefit of this work: If you actually want to rebuild a bit of XenServer you just need to have a yum repository with the XenServer RPMs in, and use 'yum-builddep' to put in place all of the build dependencies, and then building should be as simple as cloning the repository and typing 'make'.
The simplest place to do this would be in the dom0 environment itself, particularly now that the partition size has been bumped up to 20 gigs or so. However, that may well not be the most convenient. In fact, for a use case like this, the mighty Docker provides a perfect solution. Docker can quickly pull down a standard CentOS environment and then put in the reference to the XenServer yum repository, install gcc, OCaml, git, emacs and generally prepare the perfect build environment for development.
In fact, even better, Docker will actually do all of these bits for you! The docker hub has a facility for automatically building a Docker image provided everything required is in repository on Github. So we've prepared a repository containing a Dockerfile and associated gubbins that sets things up as above, and then the docker hub builds and hosts the resulting docker image.
Let's dive in with an example on how to use this. Say you have a desire to change some aspect of how networking works on XenServer, something that requires a change to the networking daemon itself, 'xcp-networkd'. We'll start by rebuilding that from the source RPM. Start the docker container and install the build dependencies:
$ docker run -i -t xenserver/xenserver-build-env [root@15729a23550b /]# yum-builddep -y xcp-networkd
this will now download and install everything required to be able to build the network daemon. Next, let's just download and build the SRPM:
[root@15729a23550b /]# yumdownloader --source xcp-networkd
At time of writing, this downloads the SRPM "xcp-networkd-0.9.6-1+s0+0.10.0+8+g96c3fcc.el7.centos.src.rpm". This will build correctly in our environment:
[root@15729a23550b /]# rpmbuild --rebuild xcp-networkd-* ... [root@15729a23550b /]# ls -l ~/rpmbuild/RPMS/x86_64/ total 2488 -rw-r--r-- 1 root root 1938536 Jan 7 11:15 xcp-networkd-0.9.6-1+s0+0.10.0+8+g96c3fcc.el7.centos.x86_64.rpm -rw-r--r-- 1 root root 604440 Jan 7 11:15 xcp-networkd-debuginfo-0.9.6-1+s0+0.10.0+8+g96c3fcc.el7.centos.x86_64.rpm
Alternatively, you can compile straight from the source. Most of our software is hosted on github, either under the xapi-project or xenserver organisations. xcp-networkd is a xapi-project repository, so we can clone it from there:
[root@15729a23550b /]# cd ~ [root@15729a23550b ~]# git clone git://github.com/xapi-project/xcp-networkd
Most of our RPMs have version numbers constructed automatically containing useful information about the source, and where the source is from git repositories the version information comes from 'git describe'.
[root@15729a23550b ~]# cd xcp-networkd [root@15729a23550b xcp-networkd]# git describe --tags v0.10.0-8-g96c3fcc
The important part here is the hash, in this case '96c3fcc'. Comparing with the SRPM version, we can see these are identical. We can now just type 'make' to build the binaries:
[root@15729a23550b xcp-networkd]# make
this networkd binary can then be put onto your XenServer and run.
The yum repository used by the container is being created directly from the snapshot ISOs uploaded to xenserver.org, using a simple bash script named update_xs_yum.sh available on github. The container default will be to use the most recently available release, but the script can be used by anyone to generate a repository from the daily snapshots too, if this is required. There’s still a way to go before Dundee is released, and some aspect of this workflow are in flux – for example, the RPMs aren’t currently signed. However, by the time Dundee is out the door we hope to make many improvements in this area. Certainly here in Citrix, many of us have switched to using this for our day-to-day build needs, because it's simply far more convenient than our old custom chroot generation mechanism.
2015 刚刚结束, 2016新年开始，正是我们向XenServer社区发节日礼物的好时节。我们发布了下一代XenServer产品Dundee预览版2。我们花了大量精力解决已上报的各种问题，也导致该测试版本和九月份的预览版1相比有些延迟。我们确信该预览版和Steve Wilson博客 (https://www.citrix.com/blogs/2015/12/14/citrix-xenserver-infrastructure-strategy/) 中对思杰在XenServer项目贡献的肯定是很好的新年礼物。此外，我们已开始在xenserver.org网站提供一系列博客把主要改进点做深度介绍。对于那些更关注该预览版亮点的朋友，现在就让我在下面为您做一一介绍。
当代的服务器致力于增强其能力，我们不仅要与时俱进，更要确保用户能创建真正反应物理服务器能力的虚拟机。Dundee预览版2 目前支持512个物理CPU（pCPU），可创建高达1.5TB内存的用户虚拟机。您也许会问是否考虑增加虚拟CPU上限，我们已经把上限扩大到32个。在Xen项目的管理程序中，我们已经默认支持PML（Page Modification Logging）。 对PML设计的详细信息已经发布在Xen项目归档中等待审核。最后，XenServer的Dom0内核版本已经升级到3.10.93。
自从Dundee预览版1在九月下旬发布后，若干安全相关的热补丁已经在XenServer 6.5 SP1中发布。 同样的补丁也应用到了Dundee并且已经包含在预览版2中。
As anyone with more than one type of server is no doubt aware, CPU feature levelling is the method by which we try to make it safe for VMs to migrate. If each host in a pool is identical, this is easy. However if the pool has non-identical hardware, we must hide the differences so that a VM which is migrated continues without crashing.
I don't think I am going to surprise anyone by admitting that the old way XenServer did feature levelling was clunky and awkward to use. Because of a step change introduced in Intel Ivy Bridge CPUs, feature levelling also ceased to work correctly. As a result, we took the time to redesign it, and the results are available for use in Dundee beta 2.
When a VM boots, the kernel looks at the available featureset and in general, turn as much on as it knows how to. Linux will go so far as to binary patch some of the hotpaths for performance reasons. Userspace libraries frequently have the same algorithm compiled multiple times, and will select the best one to use at startup, based on which CPU instructions are available.
On a bare metal system, the featureset will not change while the OS is running, and the same expectations exist in the virtual world. Migration introduces a problem with this expectation; it is literally like unplugging the hard drive and RAM from one computer, plugging it into another, and letting the OS continue from where it was. For the VM not to crash, all the features it is using must continue to work at the destination, or in other words, features which are in use must not disappear at any point.
Therefore, the principle of feature levelling is to calculate the common subset of features available across the pool, and restrict the VM to this featureset. That way the VMs featureset will always work, no matter which pool member it is currently running on.
There are many factors affecting the available featureset for VMs to use:
- The CPU itself
- The BIOS/firmware settings
- The hypervisor command line parameters
- The restrictions which the toolstack chooses to apply
Hiding features is also tricky; x86 provides no architectural means to do so. Feature levelling is therefore implemented using vendor-specific extensions, which are documented as unstable interfaces and liable to change at any point at any point inf the future (as happened with Ivy Bridge).
XenServer Pre Dundee
Older versions of XenServer would require a new pool member to be identical before it would be permitted to join. Making this happen involved consulting the
xe host-cpu-info features, divining some command line parameters for Xen and rebooting.
If everything went to plan, the new slave would be permitted to join the pool. Once a pool had been created, it was assumed to be homogeneous from that point on. The command line parameters had an effect on the entire host, including Xen itself. In a slightly heterogeneous case, the difference in features tended to be features which only Xen would care to use, so was needlessly penalised along with dom0.
In Dundee, we have made some changes:
There are two featuresets rather than one. PV and HVM guests are fundamentally different types of virtualisation, and come with different restrictions and abilities. HVM guests will necessarily have a larger potential featureset than PV, and having a single featureset which is safe for PV guests would apply unnecessary restrictions to HVM guests.
The featuresets are recalculated and updated every boot. Assuming that servers stay the same after initial configuration is unsafe, and incorrect.
So long as the CPU Vendor is the same (i.e. all Intel or all AMD), a pool join will be permitted to happen, irrespective of the available features. The pool featuresets are dynamically recalculated every time a slave joins or leaves a pool, and every time a pool member reconnects after reboot.
When a VM is started, it will be given the current pool featureset. This permits it to move anywhere in the pool, as the pool existed when it started. Changes in pool level have no effect on running VMs. Their featureset is fixed at boot (and is fixed across migrate, suspend/resume, etc.), which matches the expectation of the OS. (One release note is that to update the VM featureset, it must be shut down fully and restarted. This is contrary to what would be expected with a plain reboot, and exists because of some metatdata caching in the toolstack which is proving hard to untangle.)
Migration safety checks are performed between the VMs fixed featureset and the destination hosts featureset. This way, even if the pool level drops because a less capable slave joined, an already-running VM will still be able to move anywhere except the new, less capable slave.
The method of hiding features from VMs now has no effect on Xen and dom0. They never migrate, and will have access to all the available features (albeit with dom0 subject to being a VM in the first place).
We hope that the changes listed above will make Dundee far easier to use in a heterogeneous setup.
All of this information applies equally to inter-pool migration (Storage XenMotion) as intra-pool migration. In the case of upgrade from older versions of XenServer (Rolling Pool Upgrade, Storage XenMotion again), there is a complication because of having to fill in some gaps in the older toolstacks idea of the VMs featureset. In such a case, an incoming VM is assumed to have the featureset of the host it lands on, rather than the pool level. This matches the older logic (and is the only safe course of action), but does result in the VM possibly having a higher featureset than the pool level, and being less mobile as a result. Once the VM has been shut down and started back up again, it will be given the regular pool level and behave normally.
To summarise the two key points:
- We expect feature levelling to work between any combination of CPUs from the same vendor (with the exception of PV guests on pre-Nehalem CPUs, which lacked any masking capabilities whatsoever).
- We expect there to be no need for any manual feature configuration to join a pool together.
That being said, this is a beta release and I highly encourage you to try it out and comment/report back.
Recently, the XenServer Engineering Team has been working on improving the responsiveness of the control domain when it is under heavy load. Many VMs doing lots of I/O operations can prevent one from connecting to the host through ssh or make the XenCenter session disconnected with no apparent reason. All of this happened when the control plane was overloaded by the datapath plane, leaving very little CPU for such important processes as sshd or xapi. Let's have a look at how much time it takes to repeatedly execute a simple
xe vm-list command on a host with 20 VM pairs doing heavy network communication:
Most of the commands took around 2-3 seconds to complete, but some of them took as long as 30 seconds. The 2-3 seconds is slower than it should be, and 20-30 seconds is way outside of a reasonable operating window. The slow reaction time of 3 seconds and the heavy spikes of 30 seconds visible on the graph above are two separate issues affecting the responsiveness of the control commands. Let's tackle them one by one.
To fix the 2-3 seconds slowdown, we took advantage of the Linux kernel feature called cgroups (control groups). Cgroups allows the aggregation of processes into separate hierarchies that manage their access to various resources (CPU, memory, network). In our case, we utilised the CPU resource isolation, placing all control path daemons in the cpu control group subsystem. Giving them ten times more cpu share than datapath processes guarantee they would get enough computing power to execute control operations in a timely fashion. It's worth pointing out, that it does not slow down the datapath in times when the control plane is idle. The datapath reduces its cpu usage only when control operations need to run.
We can see that the majority of the commands took just a fraction of a second to execute, which solves the first of our problems.
What about the commands that took 20-30 seconds to print out the list of VMs? This was caused by the way in which xapi handles the creation of threads, limiting the rate based on current load and memory usage in dom0. When the load goes too high, there is not enough xapi threads to handle the requests, which results in periodic spikes in the executions of the xe commands. However, this feature was useful when the dom0 was 32 bit and when the increased number of threads might have caused some issues to the stability of the whole system. Since dom0 is 64bit, and with the control groups enabled, we decided it is perfectly safe to get rid of xapi’s thread limiting feature.
With these changes applied, the execution times of control path commands became as one would expect them to be:
In spite of heavy I/O load, control path processes receive all the CPU they need to get the job done, so can do it without any delay, leaving the user with a nicely responsive host regardless of the load in the guests. This makes a tremendous difference to the user-experience when interacting with the host via XenCenter, the
xe CLI or over SSH.
Another real world example in which we expected significant improvements is bootstorm. In this benchmark we start more than hundred VMs and measure how much time it takes for the guests to become fully operational (time measured from starting the 1st VM to the completion of the n-th VM). Usual strategy employed is to run 25 VMs at a time. Following is the comparison of the results before and after the changes:
Before, booting guests overloaded the control path which slowed down the boot process of latter VMs. After our improvements, the time of booting consecutive guests grows linearly with the whole benchmark completing twice as fast compared to the build without changes.
Another view on the same data - showing the time to boot a single VM:
CPU resource isolation and xapi improvements make VMs resilient to the load generated by the simultaneously booting guests. Each of them takes the same amount of time to become ready compared to the significant increase that happened for the host without changes. That is how you would expect for the control plane to operate.
What other benefits would that improvements bring for the XenServer users? They will have no more problems with synchronizing XenCenter with the host and issuing commands to xapi. We expect now that XenDesktop users should be able to start many VMs in the pool master leaving it still responding to control path commands. It would allow them to start more VMs on the master, reducing the necessary hardware and decreasing the total cost of ownership. Cloud administrators can have increased confidence in their ability to administer the host despite unknown and unpredictable workloads in their tenants’ VMs.
For anyone interested in playing around with the new feature, here are a couple of details of the implementation and the organisation of files in the dom0.
All the changes are contained in a single rpm called
control-slice itself is a systemd unit that defines a new slice to which all control-path daemons are assigned. You can find its configuration in the following file:
# cat /etc/systemd/system/control.slice [Unit] Description=Control path slice
By modifying the
CPUShares parameter one can change the cpu shares that control-path processes will get. Since the base value is 1024, assigning the shares of, for example, 2048 would mean that control-path processes would get twice the processing power than datapath processes. The default value for the control-slice is 10240, which means control-path processes get up to ten time more cpu than datapath. To apply the changes one has to reload the configuration and restart the
# systemctl daemon-reload
# systemctl restart control.slice
Each daemon that belongs to the control-slice has a simple configuration file that specifies the name of the slice that it belongs to, for example for xapi we have:
# cat /etc/systemd/system/xapi.service.d/slice.conf
Last but not least, systemd provides admins with a powerful utility that allows monitoring cgroups resources utilisation. These can be examined by typing the following command:
Above improvements are planned for the forthcoming XenServer Dundee release, and can be experienced with the Dundee beta.2 preview. Let us know if you liked it and if it made a difference to you!
With 2015 quickly coming to a close, and 2016 beckoning, it's time to deliver a holiday present to the XenServer community. Today, we've released beta 2 of project Dundee. While the lag between beta 1 in September and today has been a bit longer than many would've liked, part of that lag was due to the effort involved in resolving many of the issues reported. The team is confident you'll find both beta 2 and Steve Wilson's blog affirming Citrix's commitment to XenServer to be a nice gift. As part of that gift, we're planning to have a series of blogs covering a few of the major improvements in depth, but for those of you who like the highlights - let's jump right in!
XenServer has supported for many years the ability to create resource pools with processors from different CPU generations, but a few years back a change was made with Intel CPUs which impacted our ability mix the newest CPUs with much older ones. The good news is that with Dundee beta.2, that situation should be fully resolved, and may indeed offer some performance improvements. Since this is an area where we really need to get things absolutely correct, we'd appreciate anyone running Dundee to try this out if you can and report back on successes and issues.
Modern servers keep increasing their capacity, and not only do we need to keep pace, but we need to ensure users can create VMs which mirror the capacity of a physical machines. Dundee beta.2 now supports up to 512 physical cores (pCPUs), and can create guest VMs with up to 1.5 TB RAM. Some of you might ask about increasing vCPU limits, and we've bumped those up to 32 as well. We've also enabled Page Modification Logging (PML) in the Xen Project hypervisor as a default. The full design details for PML are posted in the Xen Project Archives for review if you'd like to get into the weeds of why this is valuable. Lastly we've bumped the kernel version to 3.10.93.
New SUSE templates
Since Dundee beta.1 was made available in late September, a number of security hotfixes for XenServer 6.5 SP1 have been released. Where valid, those same security patches have been applied to Dundee and are included in beta.2.
We are very pleased to make the first beta of XenServer Dundee available to the community. As with all pre-release downloads, this can be found on the XenServer Preview page. This release does include some potential commercial features, and if you are an existing Citrix customer you can access those features using the XenServer Tech Preview. It's also important to note that a XenServer host installed from the installer obtained from either source will have identical version number and identical functionality. Application of a Tech Preview license unlocks the potential commercial functionality. So with the "where do I get Dundee beta 1" out of the way, I bet you're all interested in what the cool bits are, and what things might be worth paying attention to. With that in mind, here are some of the important platform differences between XenServer 6.5 SP1 and Dundee beta 1.
The control domain, dom0, has undergone some significant changes. Last year we moved to a 64 bit control domain with a 3.10 kernel as part of our effort to increase overall performance and scalability. That change allowed us to increase VM density to 1000 VMs per host while making some significant gains in both storage and network performance. The dom0 improvements continue, and could have a direct impact on how you manage a XenServer.
dom0 now uses CentOS 7 as it's core operating system, and along with that change is a significant change in how "agents" and some scripts run. CentOS 7 has adopted systemd, and by extension so too has XenServer. This means that shell scripts started at system initialization time will need to change to follow the unit and service definition model specified for systemd.
cgroups for Control Isolation
Certain xapi processes have been isolated into Linux control groups. This allows for greater system stability under extreme resource pressure which has created a considerably more deterministic model for VM operations. The most notable area where this can be observed is under bootstorm conditions. In XenServer 6.5 and prior, starting large numbers of VMs could result in start operations being blocked due to resource contention which could result in some VMs taking significantly longer to start than others. With xapi isolation into cgroups, VM start operations no longer block as before resulting in VM start times being much more equitable. This same optimization can be seen in other VM operations such as when large quantities of VMs are shutdown.
RBAC Provider Changes
XenServer 6.5 and prior used an older version of Likewise to provide Active Directory. Likewise is now known as Power Broker, and XenServer is using the Power Broker Identity Services to provide authentication for RBAC. This has improved performance, scale and reliability, especially for complex or nested directory structures. Since RBAC is core to delegated management of a XenServer environment, we are particularly interested in feedback on any issues users might have with RBAC in Dundee beta 1.
dom0 Disk Space Usage
In XenServer 6.5 and prior, dom0 disk space was limited to 4GB. While this size was sufficient for many configurations, it was limiting for more than a few of you. As a result we've split dom0 disk into three core partitions; system, log and swap. The system partition is now 18GB which should provide sufficient for some time to come. This also means that the overall install space required for XenServer increases from 8GB to 46GB. As you can imagine, given the importance of this major change, we are very interested to learn of any situations where this change prevents XenServer from installing or upgrading properly.
Having flexible storage options is very important to efficient operation of any virtualization solution. To that end, we've added in support for three highly requested storage improvements; thin provisioned block storage, NFSv4 and FCoE.
Thin Provisioned Block Storage
iSCSI and HBA block storage can now be configured to be thinly provisioned. This is of particular value to those users who provision guest storage with a high water mark expecting that some allocated storage won't be used. With XenServer 6.5 and prior, the storage provider would allocate the entire disk space which could result in a significant reduction in storage utilization which in turn would increase the cost of virtualization. Now block storage repositories can be configured with an initial size and an increment value. Since storage is critical in any virtualization solution, we are very interested in feedback on this functional change.
Fibre Channel over Ethernet is protocol which allows Fibre Channel traffic to travel over standard ethernet networks. XenServer now is able to communicate with FCoE enabled storage solutions, and can be configured at install time to allow boot from SAN with FCoE. If you are using FCoE in your environment, we are very interested in learning both any issues as well as learning what CNA you used during your tests.
Many additional system level improvements have been made for Dundee beta 1, but the following highlight some of the operational improvements which have been made.
XenServer 6.5 and prior required legacy BIOS mode to be enabled on UEFI based servers. With Dundee beta 1, servers with native UEFI mode enabled should now be able to install and run XenServer. If you encounter a server which fails to install or boot XenServer in UEFI mode, please provide server details when reporting the incident.
Automatic Health Check
XenServer can now optionally generate a server status report on a schedule and automatically upload it to Citrix Insight Services (formerly known as TaaS). CIS is a free service which will then perform the analysis and report on any health issues associated with the XenServer installation. This automatic health check is in addition to the manual server status report facility which has been in XenServer for some time.
Improved Patch Management in XenCenter
Application of XenServer patches through XenCenter has become easier. The XenCenter updates wizard has been rewritten to find all patches available on Citrix’s support website, rather than ones that have been installed on other servers. This avoids missing updates, and allows automatic clean-up of patches files at the end of the installation.
Why Participate in the Beta Program
These platform highlights speak to how significant the engineering effort has been to get us to beta 1. They also overshadow some other arguably core items like a move to the Xen Project Hypervisor 4.6, host support for up to 5TB of host RAM or even Windows guest support for up to 1TB RAM. What they do show is our commitment to the install base and their continued adoption of XenServer at scale. Last year we ran an incredibly successful prerelease program for XenServer Creedence, and its partly through that program that XenServer 6.5 is as solid as it is. We're building on that solid base in the hopes that Dundee will better those accomplishments, and we're once again requesting your help. Download Dundee. Test it in your environment. Push it, and let us know how it goes. Just please be aware that this is prerelease code which shouldn't be placed in production and that we're not guaranteeing you'll ever be able to upgrade from it.
Download location: http://xenserver.org/prerelease
Defect database: https://bugs.xenserver.org
The XenServer team is pleased to announce the availability of the third alpha release in the Dundee release train. This release includes a number of performance oriented items and includes three new functional areas.
- Microsoft Windows 10 driver support is now present in the XenServer tools. The tools have yet to be WHQL certified and are not yet working for GPU use cases, but users can safely use them to validate Windows 10 support.
- FCoE storage support has been enabled for the Linux Bridge network stack. Note that the default network stack is OVS, so users wishing to test FCoE will need to convert the network stack to Bridge and will need to be aware of the feature limitations in Bridge relative to OVS.
- Docker support present in XenServer 6.5 SP1 is now also present in Dundee
Considerable work has been performed to improve overall I/O throughput on larger systems and improve system responsiveness under heavy load. As part of this work, the number of vCPUs available to dom0 have been increased on systems with more than 8 pCPUs. Early results indicate a significant improvement in throughput compared to Creedence. We are particularly interested in hearing from users who have previously experienced responsiveness or I/O bottlenecks to look at Alpha.3 and provide their observations.
Dundee alpha.3 can be downloaded from the pre-release download page.
Before we dive into the enhancements brought in the Storage XenMotion(SXM) feature in the XenServer Dundee Alpha 2 release; here is the refresher of various VM migrations supported in XenServer and how users can leverage them for different use cases.
XenMotion refers to the live migration of a VM(VM's disk residing on a shared storage) from one host to another within a pool with minimal downtime. Hence, with XenMotion we are moving the VM without touching its disks. E.g. XenMotion feature is very helpful during host and pool maintenance and is used with XenServer features such as Work Load Balancing(WLB)and Rolling Pool Upgrade(RPU) where the VM's residing on shared storage can be moved to other host within a pool.
Storage XenMotion is the marketing name given to two distinct XenServer features, live storage migration and VDI move. Both features refer to the movement of a VM's disk (vdi) from one storage repository to another. Live storage migration also includes the movement of the running VM from one host to another host. In the initial state, the VM’s disk can reside either on local storage of the host or shared storage. It can then be motioned to either local storage of another host or shared storage of a pool (or standalone host). The following classifications exist for SXM:
- When source and destination hosts happen to be part of the same pool we refer to it as IntraPool SXM. You can choose to migrate the VM's vdis to local storage of the destination host or another shared storage of the pool. E.g. Leverage it when you need to live migrate VMs residing on a slave host to the master of a pool.
- When source and destination hosts happen to be part of different pools we refer to it as CrossPool SXM. VM migration between two standalone hosts can also be regarded as CrossPool SXM. You can choose to migrate the VM's vdis to local storage of the destination host or shared storage of the destination pool. E.g. I often leverage CrossPool SXM when I need to migrate VM’s residing on pool having old version of XenServer to the pool having latest XenServer
- When the source and destination hosts are the same, we refer to it as LiveVDI Move. E.g. Leverage LiveVDI move when you want to upgrade your storage arrays but you don’t want to move the VMs to some another host. In such cases, you can LiveVDI move VM's vdi say from the shared storage (which is planned to be upgraded) to another shared storage in a pool without taking down your VM’s.
When SXM was first rolled out in XenServer 6.1 ,there were some restrictions on VMs before they can be motioned such as maximum number of snapshots a VM can have while undergoing SXM, VM having checkpoints cannot be motioned, the VM has to be running (otherwise it’s not live migration) etc. For the XenServer DundeeAlpha 2 release the XenServer team has removed some of those constraints .Thus below is the list of enhancements brought to SXM.
1. VM can be motioned regardless of its power status. Therefore I can successfully migrate a suspended VM or move a halted VM within a pool (intra pool) or across pools (cross pool)
2. VM can have more than one snapshot and checkpoint during SXM operation. Thus VM can have a mix of snapshots and checkpoints and it can still be successfully motioned.
3. A Halted VM can be copied from say pool A to pool B (cross-pool copy) .Note that VM’s that are not in halted state cannot be cross pool copied.
4. User created templates can also be copied from say pool A to pool B (cross-pool copy).Note that system templates cannot be cross pool copied.
Well this is not the end of the SXM improvements! Stay tuned with XenServer where in the upcoming release we aim to reduce VM downtime further during migration operations, and do download the Dundee preview and try it out yourself.
I am pleased to announce that today we have made available the second alpha build for XenServer Dundee. For those of you who missed the first alpha, it was focused entirely on the move to CentOS 7 for dom0. This important operational change is one long time XenServer users and those who have written management tooling for XenServer should be aware of throughout the Dundee development cycle. At the time of Alpha 1, no mention was made for feature changes, and with Alpha 2 we're going to talk about some features. So here are some of the important items to be aware of.
Thin Provisioning on block storage
For those who aren't aware, when a XenServer SR is using iSCSI or an HBA, the virtual disks have always consumed their entire allocated space regardless of how utilized the actual virtual disk was. With Dundee we now have full thin provisioning for all block storage independent of storage vendor. In order to take advantage of this, you will need to indicate during SR creation that thin provisioning is required. You will also be given the opportunity to specify the default vdi allocation which allows users to optimize vdi utilization against storage performance. We do know about a number of areas still needing attention, but are providing early access such that the community can further identify issues our testing hasn't yet encountered.
NFS version 4
While a simple enhancement, this was identified as a priority item during the Creedence previews last year. We didn't really have the time then to fully implement it, but as of Dundee Alpha 2 you can specify NFS 4 for SR creation in XenCenter.
XenServer 6.5 SP1 introduced support for Intel GVT-d graphics in Haswell and Broadwell chips. This support has been ported to Dundee and is now present in Alpha 2. At this point GPU operations in Dundee should have feature parity to XenServer 6.5 SP1.
CIFS for virtual disk storage
For some time we've had CIFS as an option for ISO storage, but lacked it for virtual disk storage. That has been remedied and if you are running CIFS you can now use it for all your XenServer storage needs.
Changed dom0 disk size
During installation of XenServer 6.5 and prior, a 4GB partition is created for dom0 with an additional 4GB partition created as a backup. For some users, the 4GB partition was too limiting, particularly if remote SYSLOG wasn't used or when third party monitoring tools were installed in dom0. With Dundee we've completely changed the local storage layout for dom0, and this has significant implications for all users wishing to upgrade to Dundee.
The new partition layout will consume 46GB from local storage. If there is less than 46 GB available, then a fresh install will fail. The new partition layout will be as follows:
- 512 MB UEFI boot partition
- 18 GB dom0 partition
- 18 GB backup partition
- 4 GB logs partition
- 1 GB SWAP partition
As you can see from this new partition layout that we've separated logs and SWAP out of the main operating partition, and that we're now supporting UEFI boot.
During upgrade, if there is at least 46 GB available, we will recreate the partition layout to match that of a fresh install. In the event 46GB isn't available, we will shrink the existing dom0 partition from 4 GB to 3.5 GB and create the 512 MB UEFI boot partition.
Downloading Alpha 2
Dundee Alpha 2 is available for download from xenserver.org/prerelease
It's spring time and after a particularly brutal winter here in Boston, I for one am happy to see the signs of spring. Grass and greenery, flowers budding, and warmer days all speak to good things coming. It's also time to unveil the next major XenServer project, code named Dundee. As with Creedence last year, we're going to be giving early access to a major new version of XenServer well in advance of its release. This project will have its share of functional improvements, and a few new features, but just like last year we're going to start with the platform and progress slowly.
CentOS 7 dom0
During the Creedence pre-release program, many commented on "Why CentOS 5.x? - CentOS 6 has been out for a while, and 7 is fresh". The answer to that question was pretty simple. We knew what userspace looked like with CentOS 5.x, and our users understood how to manage a CentOS 5.x system. CentOS 5 was being supported upstream until 2017, so there was no risk of us having something unsupported. Moving to CentOS 6.5 would've been a valid option if we didn't already plan on moving to CentOS 7, but we didn't want to change dom0 just to change it again in a years time. Plus if you recall, we took on quite a bit with Creedence in 2014.
So we're now a year later, and CentOS 7 makes perfect sense for dom0. Not only are there a few more upstream patches available, but Linux admins are now more comfortable with the changes in management paradigm. It's also those changes in paradigm which may present issues for you our users, and why this first alpha is all about validation. If you manage XenServer from a tool which uses the xapi SDK, then you shouldn't really experience too many problems. On the other hand, if you've favorite scripts, or tweaks you've made to configuration files, then you could be in for some extra work.
Now is also a perfect time to remind everyone that when you "upgrade" a XenServer, it's not an in place upgrade. We preserve the configuration files we know about, and then dom0 is reimaged. Any third party packages you have installed, custom scripts, and manual configuration changes have a good chance of being lost unless you've backed them up. In this case, with a move to CentOS 7, it's also possible that those items will need to be reworked to some degree.
Understanding the pre-release process
All pre-release downloads will be on our pre-release download page. We'll be providing new tagged builds every few weeks, and generally as we achieve internal milestones. With each build, we'll call out something which you as an interested participant in XenServer Dundee should be looking at. Issues encountered can be logged in the incident database at https://bugs.xenserver.org. Since we've more than one version of XenServer covered in the incident database, please make certain you report Dundee issues under the "Dundee" version. Of course there is no guarantee we'll be able to resolve what you find, but we do want to know about it. With this first alpha, we’re interested in the “big issues” you may hit, i.e. areas which would block usage of features or functionality or cases where there is a major impact. These are really useful as the product develops and matures during the alpha stage. If you are developing something for XenServer, we invite you to ask your questions on the development mailing list, but do remember it's not a product support list.
Lastly, while we're in a pre-release period, its also likely you may eventually encounter functionality which may form part of a commercial edition. At this point we're not committing to what functionality will actually ship, when it might ship, or if it'll require a commercial license. I understand that might be concerning, but it shouldn't be. If something is destined for a commercial edition, you'll see it "commercialized" in a Citrix Tech Preview before we release. Historically we're many months away from when a Tech Preview might happen, so right now the most important thing is to focus on the changes we're interested in your feedback on today - everything to do with a CentOS 7 dom0.
Download Dundee alpha.1: http://xenserver.org/preview