Virtualization Blog

Discussions and observations on virtualization.

Resetting Lost Root Password in XenServer 6.2

The Situation

Bad things can happen... badly.  In this case the root password to manage a XenServer (version 6.2) was... lost.

Physical or remote login to the XenServer 6.2 host failed authentication, naturally, and XenCenter had been disconnected from the host: requiring an administrator to provide these precious credentials, but in vein.

An Alternate Situation

Had XenCenter been left open ( offering command line access to the XenServer host in question) the following command could have been used from the XenServer's command line as to initiate a root password reset:

passwd

Once the root user's password has been changed the connection to the host from XenCenter to the XenServer host will need to be reestablished: using the root username and "new" password.

Once connected the remainder of this article becomes irrelevant otherwise you may very well need to...

Boot into Linux Single User Mode

Be it forgetfulness, change of guard, another administrator changing the password, or simply a typo in company documentation, the core problem being address via this post is that one cannot connect to XenServer 6.2 as the root password is... lost or forgotten.

As a secondary problem, one has lost patience and has obtained physical or iLO/iDRAC access to the XenServer in question, but still the root password is not accepted:

 

The Shortest Solution: Breaking The Law of Physical Security

I am not encouraging hacking, but that physical interaction with the XenServer in question and altering the boot to "linux single user mode" is the last solution to this problem.  To do this, one will need have/understand:

  • Physical Access, iLO, iDRAC, etc
  • A rebooted of the XenServer in question will be required

With disclaimers aside I now highly recommend reading and reviewing the steps outlined below before going through the motions. 

Some steps are time sensitive, so being prepared is merely a part of the overall pla.

  1. After gaining physical or iLO/iDRAC access to the XenServer in question, reboot it!  With iLO and iDRAC, there are options to hard or soft reset a system and either option is fine.
  2. Burn the following image into your mind for after the server reboots and runs through hardware BIOS/POST tests, you will see the following for 5 seconds (or so):
  3. Immediately grab the keyboard and enter the following:
    menu.c32 (press enter)
  4. The menu.c32 boot prompt will appear and again, you will only have 5 or so seconds to select the "XE" entry and pressing tab to edit boot options:
  5. Now, at the bottom of the screen one will see the boot entry information.  Don't worry, you have time so make sure it is similar to the following:
  6. Near the end of the, one should see "console=tty0 quiet vga=785 splash quiet": replace "quiet vga=785 splash" with "linux single".  More specifically - without the quotes - such as:
    linux single
  7. With that completed, simply press enter as to boot into Linux's single user mode.  You should eventually be dropped into a command line prompt (as illustrated below):
  8. Finally, we can reset the root password to something one can remember by executing the Linux command:
    passwd

  9. When prompted, enter the new root user password: you will be asked to verify it and upon success you should see the following:
  10. Now, enter the following command to reboot the XenServer in question:
    reboot
  11. Obviously, this will reboot the XenServer as illustrated below:
  12. Let the system fully reboot and present the xsconsole.  To verify that the new password has taken affect, select "Local Command Shell" from xsconsole.  This will require you to authenticate as the root user:
  13. If successful you will be dropped to the local command shell and this also means you can reconnect and manage this XenServer via XenCenter with the new root password!
Tags:
Recent Comments
Davide Poletto
Basically it's a matter of entering in Linux Single User mode to (re)initialize root's password before XenServer starts its boot p... Read More
Saturday, 12 July 2014 13:23
Davide Poletto
Basically it's a matter of entering in Linux Single User mode to (re)initialize root's password before XenServer starts its boot p... Read More
Saturday, 12 July 2014 13:31
JK Benedict
Davide, Thanks for the feedback: it is greatly appreciated. Sincerely, --jkbs @xenfomation... Read More
Wednesday, 16 July 2014 17:58
Continue reading
93274 Hits
12 Comments

XenServer Creedence Tech Preview and Creedence Alpha

This morning astute followers of XenServer activity noticed that Citrix had made available the previously announced Tech Preview for Creedence. The natural follow on question is how this relates to the alpha program we've been running on xenserver.org. The easy answer is that the Citrix Tech Preview of XenServer Creedence is binary compatible with the alpha.4 pre-release binary you can get from xenserver.org. From the perspective of the core platform (i.e. XenServer virtualization bits), the only difference is in the EULA.

So why run a Tech Preview if you have a successful alpha program already?

That's where the differences between a pure open source effort and a commercial effort begin. While the XenServer platform components are binary compatible, Citrix customers have expectations for features which couldn't be made open source, or implementations directly supporting Citrix commercial products. Perfect examples of these features and implementations can be seen on the Tech Preview download page, such as Microsoft System Center Integration, expanded vGPU support for XenDesktop, and the return of both the DVSC and WLB. While there is no guarantee any of those features or specific implementations will be present in the final Citrix release, or for that matter under what license, Citrix is seeking your input on them and a Tech Preview program is how that is accomplished.

I can't access the Tech Preview site; what's wrong?

If you can't login to the Tech Preview site, that likely means your account isn't associated with a Citrix commercial contract for XenServer. Since the alpha.4 pre-release is binary compatible, you can experience all the platform improvements yourself by downloading XenServer from the xenserver.org pre-release download page.

How do I provide feedback?

If you are able to participate in the Tech Preview program, you'll find the options for feedback listed on the Tech Preview page. Of course, even if you can participate in the Tech Preview program we're always accepting XenServer feedback through our public feedback options:

The XenServer incident database: https://bugs.xenserver.org

Development feedback (xs-devel): https://lists.xenserver.org/sympa/info/xs-devel

What cool things are in alpha.4?

This is the best for last! We've already got in Creedence a 64 bit dom0, updated Linux 3.10 kernel, updated open virtual switch 2.1.2, improved boot storm handling, read caching for file based SRs; so what goodies are in here for the core platform people? Let's start with TRIM and UNMAP to better reclaim storage, then add in 32 bit to 64 bit VM migration to support upgrade scenarios and storage migration from XenServer 6.2 and prior, all with additional operating system validation with SLES 11 SP3 and Ubuntu 14.04 LTS.

What testing would you like us to do?

Don't let the alpha label fool you, alpha.4 of XenServer Creedence has been through quite a bit of testing, and is very much ready for you to try and stress it. The one thing we can say is that we're still working on the performance tuning so if you push things really hard, dom0 may run out of memory and you might need to follow CTX134951 to increase it (valid values are 8192, 16384 and 32768). This is particularly true if you're running more than 200 VMs per host, or need to attach more than 1200 virtual disks to VMs.     

Recent Comments
Tim Mackey
James, We now have the code in place to do a 32 bit to 64 bit migration, so upgrades are now on the table for testing, but I woul... Read More
Thursday, 10 July 2014 20:44
valentin radu
Hello, You can tell me when it will be released new stable version? Thx,
Tuesday, 15 July 2014 16:29
Hong Lae Cho
Hey Tim, Would you be able to provide a bit more clarification regarding "UNMAP" functionality in Creedence Alpha 4 & Tech Previ... Read More
Monday, 21 July 2014 05:10
Continue reading
25875 Hits
9 Comments

XenServer Storage Performance Improvements and Tapdisk3

Overview

The latest development builds of XenServer (check out the Creedence Alpha Releases) enjoy significantly superior storage performance compared to XenServer 6.2 as already mentioned by Marcus Granado in his blog post about Performance Improvements in Creedence. This improvement is primarily due to the integration of tapdisk3. This blog post will introduce and discuss this new storage virtualisation technology, presenting results for experiments reaching about 10 GB/s of aggregated storage throughput in a single host and explaining how this was achieved.

Introduction

A few months ago I wrote a blog post on Project Karcygwins which covered a series of experiments and investigations we conducted around storage IO. These focused on workloads originating from a single VM and applied to a single virtual disk. We were particularly interested in understanding the virtualisation overhead added to these workloads, especially on low latency storage devices such as modern SSDs. Comparing different storage data paths (e.g. blkback, blktap2) available for use with the Xen Project Hypervisor, we explained why and when any overhead would exist as well as how noticeable it could get. The full post can be read here: http://xenserver.org/blog/entry/karcygwins.html

Since then, we expanded the focus of our investigations to encompass more complex workloads. More specifically, we started to focus on aggregate throughput and what circumstances were required for several VMs to make full use of a storage array’s potential. This investigation was conducted around the new tapdisk3, developed in XenServer by Thanos Makatos. Tapdisk3 was written to have a simpler architecture, implemented entirely in user space, and leading to substantial performance improvements.

What is new in Tapdisk3?

There are two major differences between tapdisk2 and tapdisk3. The first one is in the way this component is hooked up to the storage subsystem: while the former relied on blkback and blktap2, the latter connects directly to blkfront. The second major difference lies in the way data is transferred to and from guests: while the former used grant mapping and “memcpy”, the latter uses grant copy. For further details, refer to the section “Technical Details” at the end of this post.

Naturally, other changes were required to make all of this work. Most of them, however, are related to the control plane. For these, there were toolstack (xapi) changes and the appearance of a “tapback” component to connect everything up. Because of these changes (and some others regarding how tapdisk3 handles in-flight data), the dom0 memory footprint of a connected virtual disk also changed. This is currently under evaluation and may see further modifications before tapdisk3 is officially released.

Performance Evaluation

In order to measure the performance improvements achieved with tapdisk3, we selected the fastest host and the fastest disks we had available. This is the box we configured for this measurements:

  • Dell PowerEdge R720
    • 64 GB of RAM
    • Intel Xeon E5-2643 v2 @3.5 GHz
      • 2 Sockets, 6 cores per socket, hyper threaded = 24 pCPUs
    • Turbo up to 3.8 GHz
    • Xen Project Hypervisor governor set to Performance
      • Default is set to "On Demand" for power saving reasons
      • Refer to Rachel Berry's blog post for more information on governors
    • BIOS set to Performance per Watt (OS)
    • Maximum C-State set to 1
  • 4 x Micron P320 PCIe SSD (175 GB each)
  • 2 x Intel 910 PCIe SSD (400 GB each)
    • Each presented as 2 SCSI devices of 200 GB (for a total of 4 devices and 800 GB)
  • 1 x Fusion-io ioDrive2 (785 GB)

After installing XenServer Creedence Build #86278 (about 5 builds newer than Alpha 2) and the Fusion-io drivers (compiled separately), we created a Storage Repository (SR) on each available device. This produced a total of 9 SRs and about 2.3 TB of local storage. On each SR, we created 10 RAW Virtual Disk Images (VDI) of 10 GB each. One VDI from each SR was assigned to each VM in a round-robin fashion as in the diagram below. The guest of choice was Ubuntu 14.04 (x86_64, 2 vCPUs unpinned, 1024 MB RAM). We also assigned 24 vCPUs to dom0 and decided not to use pinning (see XenServer 6.2.0 CTX139714 for more information on pinning strategies).

blog.001.png

We first measured what aggregate throughput the host would deliver when the VDIs were plugged to the VMs via the traditional tapdisk2-blktap2-blkback data path. For that, we got one VM to sequentially write for 10 seconds on all VDIs (at the same time). We observed the total amount of data transferred. This was done with requests varying from 512 bytes up to 4 MiB. Once completed, we repeated the experiment with an increasing number of VMs (up to ten). And then we did it all again for reads instead of writes. The results are plotted below:

blog.002.png

blog.003.png

In terms of aggregate throughput, the measurements suggest that the VMs cannot achieve more than 4 GB/s when reading or writing. Next, we repeated the experiment with the VDIs plugged with tapdisk3. The results were far more impressive:

blog.004.png

blog.005.png

This time, the workload produced numbers on a different scale. For writing, the aggregate throughput from the VMs approached the 8.0 GB/s mark. For reading, it approached the 10.0 GB/s mark. For some data points in this particular experiment, the tapdisk3 data path proves to be faster than tapdisk2 by ~100% when writing and ~150% when reading. This is an impressive speed up on a metric that users really care about. 

Technical Details

To understand why tapdisk3 is so much faster than tapdisk2 from a technical perspective, it is important to first review the relevant terminology and architectural aspects of the virtual storage subsystem used with paravirtualised guests and Xen Project Hypervisors. We will focus on the components used with XenServer and generic Linux VMs. Note, however, that the information below is very similar for Windows guests when they have PV drivers installed.

blog.006.png

Traditionally, Linux guests (under Xen Project Hypervisors) load a driver named blkfront. As far as the guest is concerned, this is a driver for a normal block device. The difference is that, instead of talking to an actual device (hardware), blkfront talks to blkback (in dom0) through shared memory regions and event channels (Xen Project’s mechanism to deliver interrupts between domains). The protocol between these components is referred to as the blkif protocol.

Applications in the guest will issue read or write operations (via libc, libaio, etc) to files in a filesystem or directly to (virtual) block devices. These are eventually translated into block requests and delivered to blkfront, being normally associated with random pages within the guest’s memory space. Blkfront, in turn, will grant dom0 access to those pages so that blkback can read from or write to them. This type of access is known as grant mapping.

While the Xen Project developer community has made efforts to improve the scalability and performance of grant mapping mechanisms, there is still work to be done. This is a set of complex operations and some of its limitations are still showing up, especially when dealing with concurrent access from multiple guests. Some notable recent efforts were Matt Wilson's patches to improve locking for better scalability.

blog.007.png

In order to avoid the overhead of grant mapping and unmapping memory regions for each request, Roger Pau Monne implemented a feature called “persistent grants” in the blkback/blkfront protocol. This can be negotiated between domains where supported. When used, blkfront will grant access to a set of pages to blkback and both components will use these pages for as long as they can.

The downside of this approach is that blkfront cannot control which pages are going to be associated with requests that come from the guest’s block layer. It therefore needs to copy data from/to these requests to this set of persistently granted pages before passing blkif requests to blkback. Even with the added copy, persistent grants is a proven method for increased scalability in concurrent IO.

Both approaches presented above are entirely implemented in kernel-space within dom0. They also have something else in common: requests issued to dom0’s block layer refer to pages that actually reside in the guest’s memory space. This can trigger a potential race condition when using network-based storage (e.g. NFS and possibly iSCSI); if there is a network packet (which is associated to a page grant) queued for retransmission and an ACK arrives for the original transmission of that same packet, dom0 might retransmit invalid data or even crash (because that grant could either contain invalid data or have already been unmapped).

To get around this problem, XenServer started copying the pages to dom0 instead of using grants directly. This was done by the blktap2 component, which was introduced with tapdisk2 to deliver other features such as thin-provisioning (using the VHD format) and Storage Motion. In this design, blktap2 copies the pages before passing them to tapdisk2, ensuring safety for network-based back ends. The reasoning behind blktap2 was to provide a block device in dom0 that represented the VDI as a full-provisioned device despite its origins (e.g. a thin-provisioned file in an NFS mount).

blog.008.png

As we saw in the measurements above, this approach has its limitations. While it works well for a variety of storage types, it fails to scale in terms of performance with modern technologies such as several locally-attached PCIe SSDs. To respond to these changes in storage technologies, XenServer Creedence will include tapdisk3 which makes use of another approach: grant copy.

blog.009.png

With the introduction of the 3.x kernel series to dom0 and consequently the grant device (gntdev), we were able to access pages from other domains directly from dom0’s user space (domains are still required to explicitly grant proper access through the Xen Project Hypervisor). This technology allowed us to implement tapdisk3, which uses the gntdev and the event channel device (evtchn) to communicate directly with blkfront. However, instead of accessing pages as before, we coded tapdisk3 to use a Xen Project feature called “grant copy”.

Grant copying data is much faster than grant mapping and then copying. With grant copy, pretty much everything happens within the Xen Project Hypervisor itself. This approach also ensures that data is present in dom0, making it safe to use with network-attached backends. Finally, because all the logic is implemented in a user-space application, it is trivial to support thin-provisioned formats (e.g. VHD) and all the other features we already provided such as Storage Motion, snapshotting, fast clones, etc. To ensure a block device representing the VDI is still available in dom0 (for VDI copy and other operations), we continued to connect tapdisk3 to blktap2.

Last but not least, the avid reader might wonder why XenServer is not following the footsteps of qemu-qdisk which implements persistent grants in user space. In order to remain safe for network-based backends (i.e. with persistent grants, requests would be associated with grants for pages that actually lie in guests’ memory space -- just like in Approach 2 above), qemu-qdisk disables the O_DIRECT flag to issue requests to a VDI. This causes data to be copied to/from dom0’s buffer cache (hence guaranteeing safety as requests will be associated with pages local to dom0). However, persistent grants imply that a copy has already happened in the guest and the extra copy in dom0 is simply adding on the latency of serving a request and CPU overhead. We believe grant copy to be a better alternative.

Conclusions

In this post I compared tapdisk2 to tapdisk3 by showing performance results for aggregated workloads from sets of up to ten VMs. This covered a variety of block sizes over read and write sequential operations. The experiment took place on a modern and fast Intel-based server using state-of-the-art PCIe SSDs. It showed tapdisk3’s superiority in terms of design and consequently performance. For those interested in what happens under the hood, I went further and compared the different virtual data paths used in Xen Project Hypervisors with focus on XenServer and Linux guests.

This is also a good opportunity to thank and acknowledge XenServer Storage Engineer Thanos Makatos’s brilliant work and effort on tapdisk3 as well as everyone else involved in the project: Keith Petley, Simon Beaumont, Jonathan Davies, Ross Lagerwall, Malcolm Crossley, David Vrabel, Simon Rowe and Paul Durrant.

Recent Comments
Niklas Ahden
This is great news! I do really appreciate the reading and I am looking forward to the next XenServer-release. What version-number... Read More
Wednesday, 02 July 2014 16:50
Felipe Franciosi
Hi Kai! The answer is YES. For this post, we have focused our measurements and comparisons on a storage infrastructure that is ca... Read More
Monday, 07 July 2014 16:02
Tobias Kreidl
Why would anyone base a commercial product on a pre-release or a release that's been out for just a few weeks or months? That is j... Read More
Monday, 07 July 2014 15:49
Continue reading
28489 Hits
17 Comments

Overview of the Performance Improvements between XenServer 6.2 and Creedence Alpha 2

The XenServer Creedence Alpha 2 has been released, and one of the main focuses in Alpha 2 was the inclusion of many performance improvements that build on the architectural improvements seen in Alpha 1. This post will give you an overview of these performance improvements in Creedence, and will start a series of in-depth blog posts with more details about the most important ones.

Creedence Alpha 1 introduced several architectural improvements that aim to improve performance and fix a series of scalability limits found in XenServer 6.2:

  • A new 64-bit Dom0 Linux kernel. The 64-bit kernel will remove the cumbersome low/high-memory division present in the previous 32-bit Dom0 kernel, which limited the maximum amount of memory that Dom0 could use and which added memory access penalties in a Dom0 with more than 752MB RAM. This means that the Dom0 memory can now be arbitrarily scaled up to cope with memory demands of the latest vGPU, disk and network drivers, support for more VMs and internal caches to speed up disk access (see, for instance, the Read-caching section below).

  • Dom0 Linux kernel 3.10 with native support for the Xen Project hypervisor. Creedence Alpha 1 adopted a very recent long-term Linux kernel. This modern Linux kernel contains many concurrency, multiprocessing and architectural improvements over the old xen-Linux 2.6.32 kernel used previously in XenServer 6.2. It contains pvops features to run natively on the Xen Project hypervisor, and streamlined virtualization features used to increase datapath performance, such as a grant memory device that allows Dom0 user space processes to access memory from a guest (as long as the guest agrees in advance). Additionally, the latest drivers from hardware manufacturers containing performance improvements can be adopted more easily.

  • Xen Project hypervisor 4.4. This is the latest Xen Project hypervisor version available, and it improves on the previous version 4.1 on many accounts. It vastly increases the number of virtual event channels available for Dom0 -- from 1023 to 131071 -- which can translate into a correspondingly larger number of VMs per host and larger numbers of virtual devices that can be attached to them. XenServer 6.2 was using a special interim change that provided 4096 channels, which was enough for around 500 VMs per host with a few virtual devices in each VM. With the extra event channels in version 4.4, Creedence Alpha 1 can have each of these VMs endowed with a richer set of virtual devices. The Xen Project hypervisor 4.4 also handles grant-copy locking requests more efficiently, improving aggregate network and disk throughput; it facilitates future increases to the supported amount of host memory and CPUs; and it adds many other helpful scalability improvements.

  • Tapdisk3. The latest Dom0 disk backend design has been enabled by default for all the guest VBDs. While the previous tapdisk2 in XenServer 6.2 would establish a datapath to the guest in a circuitous way via a Dom0 kernel component, tapdisk3 in Creedence Alpha 1 establishes a datapath connected directly to the guest (via the grant memory device in the new kernel), minimizing latency and using less CPU. This results in big improvements in concurrent disk access and a much larger total aggregate disk throughput for the VBDs. We have measured aggregate disk throughput improvements of up to 100% on modern disks and machines accessing large blocksizes with large number of threads and observed local SSD arrays being maxed out when enough VMs and VBDs were used.

  • GRO enabled by default. The Generic Receive Offload is now enabled by default for all PIFs available to Dom0. This means that for GRO-capable NICs, incoming network packets will be transparently merged by the NIC and Dom0 will be interrupted less often to process the incoming data, saving CPU cycles and scaling much better with 10Gbps and 40Gbps networks. We have observed incoming single-stream network throughput improvements of 300% on modern machines.

  • Netback thread per VIF. Previously, XenServer 6.2 would have one netback thread for each existing Dom0 VCPU and a VIF would be permanently associated with one Dom0 VCPU. In the worst case, it was possible to end up with many VIFs forcibly sharing the same Dom0 VCPU thread, while other Dom0 VCPU threads were idle but unable to help. Creedence Alpha 2 improves this design and gives each VIF its own Dom0 netback thread that can run on any Dom0 VCPU. Therefore, the VIF load will now be spread evenly across all Dom0 VCPUs in all cases.

Creedence Alpha 2 then introduced a series of extra performance enhancements on top of the architecture improvements of Creedence Alpha 1:

  • Read-caching. In some situations, several VMs are all cloned from the same base disk so share much of their data while the few different blocks they write are stored in differencing-disks unique to each VM. In this case, it would be useful to be able to cache the contents of the base disk in memory, so that all the VMs can benefit from very fast access to the contents of the base disk, reducing the amount of I/O going to and from physical storage. Creedence Alpha 2 introduces this read caching feature enabled by default, which we expect to yield substantial performance improvements in the time it takes to boot VMs and other desktop and server applications where the VMs are mostly sharing a single base disk.

  • Grant-mapping on the network datapath. The pvops-Linux 3.10 kernel used in Alpha 1 had a VIF datapath that would need to copy the guest's network data into Dom0 before transmitting it to another guest or host. This memory copy operation was expensive and it would saturate the Dom0 VCPUs and limit the network throughput. A new design was introduced in Creedence Alpha 2, which maps the guest's network data into Dom0's memory space instead of copying it. This saves substantial Dom0 VCPU resources that can be used to increase the single-stream and aggregate network throughput even more. With this change, we have measured network throughput improvements of 250% for single-stream and 200% for aggregate stream over XenServer 6.2 on modern machines. 

  • OVS 2.1. An openvswitch network flow is a match between a network packet header and an action such as forward or drop. In OVS 1.4, present in XenServer 6.2, the flow had to have an exact match for the header. A typical server VM could have hundreds or more connections to clients, and OVS would need to have a flow for each of these connections. If the host had too many such VMs, the OVS flow table in the Dom0 kernel would become full and would cause many round-trips to the OVS userspace process, degrading significantly the network throughput to and from the guests. Creedence Alpha 2 has the latest OVS 2.1, which supports megaflows. Megaflows are simply a wildcarded language for the flow table allowing OVS to express a flow as group of matches, therefore reducing the number of required entries in the flow table for the most common situations and improving the scalability of Dom0 to handle many server VMs connected to a large number of clients.

Our goal is to make Creedence the most scalable and fastest XenServer release yet. You can help us in this goal by testing the performance features above and verifying if they boost the performance you can observe in your existing hardware.

Debug versus non-debug mode in Creedence Alpha

The Creedence Alpha releases use by default a version of the Xen Project hypervisor with debugging mode enabled to facilitate functional testing. When testing the performance of these releases, you should first switch to using the corresponding non-debugging version of the hypervisor, so that you can unleash its full potential suitable for performance testing. So, before you start any performance measurements, please run in Dom0:

cd /boot
ln -sf xen-*-xs?????.gz xen.gz   #points to the non-debug version of the Xen Project hypervisor in /boot

Double-check that the resulting xen.gz symlink is pointing to a valid file and then reboot the host.

You can check if the hypervisor debug mode is currently on or off by executing in Dom0:

xl dmesg | fgrep "Xen version"

and checking if the resulting line has debug=y or debug=n. It should be debug=n for performance tests.

You can reinstate the hypervisor debugging mode by executing in Dom0:

cd /boot
ln -sf xen-*-xs?????-d.gz xen.gz   #points to the debug (-d) version of the Xen Project hypervisor in /boot

and then rebooting the host.

Please report any improvements and regressions you observe on your hardware to the This email address is being protected from spambots. You need JavaScript enabled to view it. list. And keep an eye out for the next installments of this series!

Recent Comments
Tobias Kreidl
Turning off debugging will be very interesting to compare with the standard debug setting. I already have a set of benchmarks I ra... Read More
Wednesday, 18 June 2014 19:55
Tobias Kreidl
@James: Sure... Benchmarks with bonnie++ (V1.93c) on XenServer Creedence Alpha.2 with dom0 increased to 2 GB memory, otherwise th... Read More
Friday, 20 June 2014 04:30
Tobias Kreidl
Sure, here they are. The random operations seem to be worse for Alpha.2 in this case, but see an important note about this furthe... Read More
Monday, 23 June 2014 17:59
Continue reading
21249 Hits
11 Comments

XenServer Creedence Alpha 2 Released

We're pleased to announce that XenServer Creedence Alpha 2 has been released. Alpha 2 builds on the capabilities seen in Alpha 1, and we're interested your feedback on this release. With Alpha 1, we were primarily interested in receiving basic feedback on the stability of the code, with Alpha 2 we're interested in feedback not only on basic operations, but also storage performance.

The following functional enhancements are contained in Alpha 2.

  • Storage read caching. Boot storm conditions in environments using common templates can create unnecessary IO on shared storage systems. Storage read caching uses free dom0 memory to cache common read IO and reduce the impact of boot storms on storage networks and NAS devices.
  • DM Multipath storage support. For users of legacy MPP-RDAC, this functionality has been deprecated in XenServer Creedence following storage industry practices. If you are still using MPP-RDAC with XenServer 6.2 or prior, please enter an incident in https://bugs.xenserver.org to record your usage such that we can develop appropriate guidance.
  • Support for Ubuntu 14.04 and CentOS 5.10 as guest operating systems

The following performance improvements were observed with Alpha 2 compared to Alpha 1, but we'd like to hear your experiences.

  • GRO enabled physical network to guest network performance improved by 65%
  • Aggregate network throughput improved by 50%
  • Disk IO throughput improved by 100%

While these improvements are rather impressive, we do need to be aware this is alpha code. What this means in practice is that when we start looking at overall scalability the true performance numbers could go down a bit to ensure stable operations. That being said, if you have performance issues with this alpha we want to hear about them. Please also look to this blog space for updates from our performance engineering team detailing how some of these improvements were measured.

 

Please do download XenServer Creedence Alpha 2, and provide your feedback in our incident database.     

Recent Comments
Tobias Kreidl
In Creedence Alpha 1, we did not see any discernible storage performance difference compared to XS 6.2 SP1, so it will definitely ... Read More
Tuesday, 10 June 2014 14:42
James Bulpin
We'd very much like to move to CentOS 7 when it becomes available. However the timing of this, and the need to integrate and stabi... Read More
Friday, 13 June 2014 15:44
Bruno de Paula Larini
But there will be plans to support RHEL7/CentOS 7 guests?
Thursday, 26 June 2014 12:27
Continue reading
17469 Hits
17 Comments

The reality of a XenServer 64 bit dom0

One of the key improvements in XenServer Creedence is the introduction of a 64 bit control domain (dom0). This is something I've heard requests for over the past several years, and while there are many reasons to move to a 64 bit dom0, there are some equally good reasons for us to have waited this long to make the change. It's also important to distinguish between a 64 bit hypervisor, which XenServer has always used, and a 64 bit control domain which is the new bit.

Isn't XenServer 64 bit bare-metal virtualization?

The good news for people who think XenServer is 64 bit bare metal virtualization is they're not wrong. All versions of XenServer use the Xen Project hypervisor, and all versions of XenServer have configured that hypervisor to run in 64 bit mode. In a Xen Project based environment, the first thing the hypervisor does is load its control domain (dom0). For all versions of XenServer prior to Creedence, that control domain was 32 bit.

The goodness of 64 bit

A 64 bit Linux dom0 allows us to remove a number of bottlenecks which were arbitrarily present with a 32bit Linux dom0.

Goodbye low memory

One of the biggest limiting factors with a 32bit Linux dom0 is the concept of "low memory". On 32 bit computers, the maximum directly addressable memory is 4GB. There are a variety of extensions available to address beyond the 4GB limit, but they all come with a performance and complexity cost. Further, as 32bit Linux evolved, a concept of "Low" memory and "High" memory was created with most everything related to the kernel using low memory and userspace processes running in high memory. Since we're talking about kernel operations consuming low memory, that also means that any kernel memory maps and kernel drivers also consume low memory and you can see how low memory quickly becomes a precious commodity. It also can be increasingly consumed the more memory available to a 32bit Linux dom0. In a typical XenServer installation this value is only 752MB, and with the move to a 64bit dom0 will soon be a thing of the past.

Improved driver stability

In a 32bit BIOS, MMIO regions are always placed between the 1GB and 3GB physical address space, and with a 64bit BIOS they are always placed at the top of the physical memory. If an MMIO hole is created dom0 can choose to re-map that memory so that it is now shadowed in virtual address space. The kernel must map a roughly equal amount of memory to the size of the MMIO holes, which in a 32bit kernel must be done in low memory. Many drivers map kernel memory to MMIO holes on demand, resulting in boot time success but potential instability in the driver if there is insufficient low memory to satisfy the dynamic request for a MMIO hole remap. Additionally, while XenServer currently supports PAE, and can address more than 4GB, if the driver insists on having its memory allocated in 32bit physical address space, it will fail.

Support for more modern drivers

Hardware vendors want to ensure the broadest adoption for their devices, and with modern computers shipping with 64bit processors and memory configurations well in excess of 4GB, the majority of those drivers have been authored for 64bit operating systems, and 64bit Linux is no different. Moving to a 64bit dom0 gives us an opportunity to incorporate those newer devices and potentially have fewer restrictions on the quantity of devices in a system due to kernel memory usage.

Improved computational performance

During the initial wave of operating systems moving from 32bit to 64bit configurations, one of the often cited benefits was the computational improvements offered from such a migration. We fully intend for dom0 to take advantage of general purpose improvements offered from the processor no longer needing to operate in what is today more of a legacy configuration. One of the most immediate benefits is with a 64bit dom0, we can use 64bit compiler settings and take advantage of modern processor extensions. Additionally, there are several performance concessions (such as the quantity of event channels) and code paths which can be optimized when compiled for a 64bit system versus a 32bit system.

Challenges do exist, though

While there are clear benefits to a 64 bit migration, the move isn't without its set of issues as well. The most significant issue relates to XenServer pool communications. In a XenServer pool, all hosts have historically been required to be at the same version and patch level except during the process of upgrading the pool. This restriction serves to ensure that any data being passed between hosts, and configuration information being replicated in xenstore and all operations are consistent. With the introduction of Storage XenMotion in XenServer 6.1, this requirement was also extended to cross pool operations when a vdi is being migrated.

 

In essence you can think of the version requirement as being the insurance against bad things happening to your precious VM cargo. Unfortunately, that requirement has an assumption of dom0 having the same architecture and our migration from 32bit to 64bit complicates things. This is due to the various data structures used in pool operations having been designed with a consistent view of a world based on a 32bit operating system. This view of the world is directly challenged with a pool consisting of both 32bit and 64bit XenServer hosts as would be present during an upgrade. It's also challenged during cross pool storage live migration if one pool is 32bit while the second is 64bit. We are working to resolve this problem at least for the 32bit to 64bit upgrade, but it will be something we're going to want very active participation from our user community to test once we have completed our implementation efforts. After all we do subscribe to the notion that your virtual machines are pets and not cattle; regardless of the scale of your virtualized infrastructure.     

Recent Comments
chaitanya
Hi, Good concept!! so, will this be included in alpha.2? Chaitanya.
Friday, 06 June 2014 17:52
Tim Mackey
Chaitanya, Alpha.1 already has a 64 bit dom0, and with Alpha.2 we're bringing more improvements online. -tim... Read More
Tuesday, 10 June 2014 17:15
Tobias Kreidl
Great synopsis, Tim. This release is going to be the best ever.
Tuesday, 10 June 2014 14:38
Continue reading
32382 Hits
9 Comments

Validation of the Creedence Alpha

On Monday May 19th early access to XenServer Creedence builds started from xenserver.org.  The  xenserver.org community has access to XenServer pre-release installation media of an alpha quality and is invited to provide feedback on it.

This blog describes the validation and system testing performed on the first alpha build.

Test Inventory

The XenServer development process incorporates daily automated regression testing complemented by various additional layers of testing, both automated and manual, that are executed less frequently.

In outline, these are the test suites and test cycles executed during XenServer development.

Automated short-cycle regression testing (“BVT”) for fast feedback to developers – on every build on every branch.

Automated medium-cycle regression testing (“BST”) to maintain quality on team branches

Automated long-cycle system regression test (“Nightly”) – on select builds on select branches, aimed at providing wide regression coverage on a daily basis

Automated performance regression test, measures several hundred key performance indicators – run on select builds on select branches on a regular basis

Automated stress test (huge numbers of lifecycle operations on single hosts) – run once per week on average

Automated pool stress test (huge numbers of lifecycle and storage operations on XS pools) - run once per week on average

Automated long-cycle system regression test (“Full regression”) – on select builds on select branches, aimed at providing extensive wide test coverage, this comprises a huge number of tests and takes several days to run, frequency of run is therefore one every two weeks on average

Automated large scale stability test (huge numbers of VMs on large XS pools, boot storm and other key ‘scale’ use-cases – run on demand, usually several times in the run up to a product release and ahead of key internal milestone including deliveries to other Citrix product groups

Automated soak test – run on demand, usually several times in the run up to a product release and ahead of key internal milestone including deliveries to other Citrix product groups, this comprises long-running tests aimed at validating XS over an extended time period

Automated upgrade test – run ahead of key milestones and deliveries to validate upgrade procedures for new releases.

Manual test – exploratory testing using XenCenter, aimed principally at testing edge cases and scenarios that are not well covered by automation, cycles of manual testing are carried out on a regular basis, and ahead of key milestones and deliveries.

Exit Criteria

Each stage of a XenServer release project requires different test suites to have been run “successfully” (usually meaning a particular pass-rate has been achieved and/or failures are understood and deemed acceptable).

However test pass rates are only a barometer of quality – if one test out of a hundred fails then that may not matter, but on the other what if that one test case failure represents a high impact problem affecting a common use-case? For this reason we also use defect counts and impact analyses as part of the exit criteria.

XenServer engineering maintains a high quality bar throughout the release cycle – the “Nightly” automated regression suite comprising several thousand test cases must always achieve over 95%. If it does not, then new feature development stops while bugs are fixed and code reverted until a high pass rate is restored.

The Alpha.next release is a drop from the Creedence project branch that has achieved the following pass rates on the following test suites.:

  • ·         Nightly regression  – 96.5%
  • ·         Stress –  passed (no pass rate for this suite)
  • ·         Pool stress –  passed (no pass rate for this suite)
  • ·         General regression –  91.3%

Drops later in the project lifecycle (e.g. Tech Preview) will be subjected to more testing and with more stringent exit criteria.

More Info

For more information on the automation framework used for these tests, please read my blog about XenRT!

Recent comment in this post
Rob Gilson
We downloaded and installed it earlier this month. Outwardly no changes but a lot of under-the-covers fixes and upgrades that were... Read More
Sunday, 22 June 2014 22:33
Continue reading
10147 Hits
1 Comment

Public availability of XenServer SDK development snapshots

Those of you developing against XenServer and wishing to test early and often while the product evolves will be pleased to hear that nightly snapshots of the XenServer Software Development Kit created from the trunk and Creedence branches are now available for download on our development snapshots page and our Creedence Alpha release page respectively.

There are five SDKs, one for each of the C, C#, Java, PowerShell and Python programming languages, all available as a single zip file download. They are generally backwards compatible with older XenServer versions, but it is always preferable to use them against the most recent version. For usage information and further details on developing products for XenServer please visit our XenServer SDK Overview page.

What's new in the SDK with the Creedence Alpha release

Since the latest release of the SDK for XenServer 6.2.0 Service Pack 1 with Hotfix XS62ESP1004 we have fixed several bugs and implemented a number of improvements, the most important of which are the following:

PowerShell SDK

  • Previously released as a PowerShell v2.0 Snap-In, the XenServer PowerShell SDK is now shipped as a more versatile PowerShell v2.0 Module.
  • XenServer's per host HTTP interface has now been exposed in the PowerShell SDK, enabling users to perform operations such as VM importing and exporting, patch upload, retrieval of performance statistics and VNC consoles etc.

C# SDK

  • XML documentation has been added to the class methods and public properties.

C SDK

  • Support has been provided for building on Windows machines with cygwin.

Other changes

  • The XenAPI reference is now shipped within the XenServer-SDK.zip file in both pdf and html format.
Continue reading
12487 Hits
0 Comments

XenCenter Usability Enhancements

XenCenter, the Windows management console for XenServer, needs no introduction. From XenCenter server administrators can perform common operations such as starting, stopping and migrating VMs, managing the XenServer resource pool which hosts those VMs, as well as obtaining an overview of their system and monitoring its overall health status.

Being an intuitive graphical user interface, XenCenter is expected to offer a seamless experience, allowing users to navigate effortlessly through the resources of their system and access vital information easily and in time. Is this always the case though?

How user-friendly is XenCenter?

The XenCenter team contacted a user experience survey among a number of XenDesktop development and test engineers using XenCenter on a daily basis to carry out typical tasks such as creating and configuring VMs, taking VM snapshots and monitoring or troubleshooting VMs. While there were many positive comments, the survey also revealed certain usability problems:

  • The functionality to obtain alternative views of the managed resources is under the drop-down menu above the main tree-view, where many users had not found it.
  • Similarly, the options to perform complex searches on these resources are hidden under a drop-down menu at the top of the Search tab.
  • System notifications are accessible by clicking the button at the top right of the application window and launching the System Alerts dialog, whereas errors generated when XenCenter attempted to carry out a task are in a separate place, on the Logs tab of each resource.
  • It is difficult to obtain an overview of running events and monitoring their status, because it requires iterating through all the managed resources and examining the Logs tab of each one of them.

b2ap3_thumbnail_xc1.png

So, what has changed?

The XenCenter team invested in redesigning the XenCenter interface to address the issues described above and enhance its usability.

The first obvious change is the implementation of the Outlook-style navigation buttons at the left of the application window. The top four buttons, Infrastructure, Objects, Organization Views and Saved Searches, replace the old drop-down menu at the top of the tree-view, offering one-click access to the different views of managed resources and a consistent way of browsing these resources by location, type, attribute or a pre-saved custom filter respectively.

b2ap3_thumbnail_xc2.png

Similarly, working with complex searches on the system resources has been simplified by redesigning the controls of the Search tab, thus increasing the visibility of its different functions and improving the workflow for creation and modification of complex search queries.

b2ap3_thumbnail_xc3.png

Notifications, the last of the new Outlook-style buttons, provides access to a central location where the administrator can obtain an overview of all messages generated by the system and address them in a consistent fashion, while still being able to differentiate between the different type of notifications.

The first of the notification views, Alerts, replaces the old System Alerts dialog and is a powerful interface with capabilities of filtering alerts by severity, date or location, collectively exporting or dismissing alerts, as well as addressing them individually by means of the new action buttons at the right of the list.

b2ap3_thumbnail_xc4.png

It should be noted that notifications on software updates are no longer included in the system alerts, but are now more noticeable by being displayed separately on the second of the notification views, Updates, which replaces the old Check for Updates dialog. In consistence with the alerts view, the new pane offers capabilities of filtering the updates by location or date, as well as one-click initialization of the download and installation process via the new action buttons.

b2ap3_thumbnail_xc5.png

The last of the notifications views, Events, which replaces the Logs tab, is the one-stop pane for viewing and monitoring the status of all the events taking place in the system, regardless of the object selected on one of the other navigation views. The events can be filtered by progress status, location or date, while the convenient action buttons allow among others cancellation of a process in progress and single-click navigation back to the relevant object on the Infrastructure view.

b2ap3_thumbnail_xc6.png

Even if the user is not currently viewing one of the Alerts, Updates or Events panes, new notifications requiring attention are still noticeable thanks to the red blob on the Notifications button, which reports the total number of alerts, updates and error events occurring in the system.

The new interface is great, I want it now!

The redesigned XenCenter is code complete and available within the XenServer Creedence Alpha release.

Tags:
Recent Comments
Alan Lantz
Definitely like the new interface. Finding that the logs are now under events was the biggest challenge. The only annoyance was I ... Read More
Tuesday, 20 May 2014 19:22
Tobias Kreidl
This is a cleaner interface and indeed, does take a bit of getting used to. The filter mechanism is certainly useful, as is the ab... Read More
Thursday, 22 May 2014 18:40
Tobias Kreidl
The layout is more modern, takes a bit getting used to, but overall works well. The ability to hide and reposition columns is very... Read More
Friday, 23 May 2014 05:10
Continue reading
15541 Hits
6 Comments

XenServer.next Alpha Available for Download

XenServer.next Alpha Available

The XenServer engineering team is pleased to announce the availability an alpha of the next release of XenServer, code named “Creedence”. XenServer Creedence is intended to represent the latest capabilities in XenServer with a target release date determined by feature completeness. Several key areas have been improved over XenServer 6.2, and singificantly we have also introduced a 64 bit control domain architecture and updated the Xen Project hypervisor to version 4.4. Due to these changes, we are requesting tests using this alpha be limited to core functionality such as the installation process and basic operations like VM creation, start and stop. Performance and scalability tests should be deferred until a later build is nominated to alpha or beta status.

This is pre-release code and as such isn’t appropriate for production use, and is unlikely to function properly with provisioning solutions such as Citrix XenDesktop and Citrix CloudPlatform. It is expected that users of Citrix XenDesktop and Citrix CloudPlatform will be able to begin testing Creedence within the XenServer Tech Preview time-frame announced at Citrix Synergy. In preparation for the Tech Preview, all XenServer users, including those running XenDesktop, are encouraged to validate if Creedence is able to successfully install on their chosen hardware.

Key Questions

When does the alpha period start?

The alpha period starts on May 19th 2014

When does the alpha period end?

There is no pre-defined end to the alpha period. Instead, we’re providing access to nightly builds and from those nightly builds we’ll periodically promote builds to “alpha.x” status. The promotion will occur as key features are incorporated and stability targets are reached. As we progress the alpha period will naturally transition into a beta or Tech Preview stage ultimately ending with a XenServer release. Announcements will be made on xenserver.org when a new build is promoted.

Where do I get the build?

The build can be downloaded from xenserver.org at: http://xenserver.org/index.php?option=com_content&view=article&layout=edit&id=142

If I encounter a defect, how do I enter it?

Defects and incidents are expected with this alpha, and they can be entered at https://bugs.xenserver.org. Users wishing to submit or report issues are advised to review our submission guidelines to ensure they are collecting enough information for us to resolve any issues.

Where can I find more information on Creedence?

We are pleased to announce a public wiki has been created at https://wiki.xenserver.org to contain key architectural information about XenServer; including details about Creedence.

How do I report compatibility information?

The defect system offers Hardware and Vendor compatibility projects to collect information about your environment. Please report both successes and failures for our review.

What about upgrades?

The alpha will not upgrade any previous version of XenServer, including nightly builds from trunk, and there should be no expectation the alpha can be upgraded.

Do I need a new XenCenter?

Yes, XenCenter has been updated to work with the alpha and can be installed from the installation ISO.

Will I need a new SDK?

If you are integrating with XenServer, the SDK has also been updated. Please obtain the SDK for the alpha from the download page.

Where can I ask questions?

Since the Creedence alpha is being posted to and managed by the xenserver.org team, questions asked on Citrix Support Forums are likely to go unanswered. Those forums are intended for released and supported versions of XenServer. Instead we are inviting questions on the xs-devel mailing list, and via twitter to @XenServerArmy. In order to post questions, you will need to subscribe to the mailing list which can be done here: http://xenserver.org/discuss-virtualization/mailing-lists.html. Please note that the xs-devel mailing list is monitored by the engineering team, but really isn’t intended as a general support mechanism. If your question is more general purpose and would apply to any XenServer version, please validate if the issue being experienced is also present with XenServer 6.2 and if so ask the question on the Citrix support forums.  We've also created some guidelines for submitting incidents.

Recent Comments
Tim Mackey
James, This first release (alpha.1) is really about core functionality. With a new Xen Project hypervisor and 64bit dom0 there i... Read More
Monday, 19 May 2014 22:31
Tobias Kreidl
Tim, Awesome! The user community is collectively excited about this next evolutionary step for XenServer. It would be great to hav... Read More
Monday, 19 May 2014 18:04
Andrew Halley
Hi Tobias, there's a summary of the alpha(.1) content available on the wiki here: https://wiki.xenserver.org/index.php?title=XenSe... Read More
Tuesday, 20 May 2014 16:15
Continue reading
31956 Hits
20 Comments

Call for Participation for Xen User Summit

The Xen Project team is once again hosting a Xen Project User Summit, and this year they'd like to get some XAPI and XenServer related discussions going.  This is a perfect opportunity to showcase how you've used XenServer or XCP to solve real world problems, and provide valuable insight to others in the community on how to avoid any problems you've run into.  If you'd like to submit a proposal for a talk, or know of someone who would be perfect to cover cool XenServer/XCP content, then please do consider submitting something.

Important Details

When: September 15th, 2014

Where: New York City, Lighthouse Executive Conference Center

Where to submit topics: Linux Foundation Xen Project User Summit

Deadline: May 31st, 2014

 

Recent comment in this post
JK Benedict
In short: xsconsole, xapi/xe as an open source download, and additional security XSCONSOLE -- Personally, I have modified the cod... Read More
Saturday, 17 May 2014 22:07
Continue reading
7487 Hits
1 Comment

Whatever happened to XenServer's Windsor architecture?

b2ap3_thumbnail_Slide1.JPGAt the 2012 Xen Project Developer Summit in San Diego I talked about the evolution of XenServer's architecture, specifically our forward looking R&D work looking at a set of architectural changes known as "Windsor". The architecture includes a number of foundational overhauls, such as moving to a 64 bit domain-0 with a PVops kernel and upgrading to the upstream version of qemu (XenServer currently uses a forked Xen Project version and therefore doesn't benefit from new features and improvements made in the more active upstream project). Those of you following the xenserver.org development snapshots will have seen a number of these key component overhauls already.

The more notable changes in the new architecture include various forms of improved modularity within the system including "domain-0 disaggregation" as well as improved intra-component modularity and better internal APIs.

We wanted to do this for various reasons including:

  1. To improve single-host scalability (e.g. the number of VMs and the amount of aggregate I/O the system can sustain) by parallelizing the handling of I/O over a number of driver domains
  2. To enable better multi-host scalability in scale-out cloud environments, primarily by allowing each host to run more independently and therefore reduce the bottleneck effect of the pool master
  3. To create the capability to have additional levels of tenant isolation by having per-tenant driver domains etc.
  4. To allow for possible future third party service VMs (driver domains etc.)


So where are we at with this? In the single-host scalability area, something that Citrix customers care a lot about, we had a parallel effort to try to improve scale and performance in the short term by scaling up domain-0 (i.e. adding more vCPUs and memory) and tactically removing bottlenecks. We actually did better that we expected with this so it's reduced the urgency to build the "scale-out" disaggregated solution. Some of this works is described in Jonathan Davies' blog posts: How did we increase VM density in XenServer 6.2? and How did we increase VM density in XenServer 6.2? (part 2)

XenServer today does have some (officially unsupported) mechanisms to run driver domains. These have been used within Citrix in a promising evaluation of the use of storage drivers domains for a physical appliance running the Citrix CloudBridge product, performing significant amounts of caching related I/O to a very large number of local SSDs spread across a number of RAID controllers. This is an area where the scale-out parallelism of Windsor is well suited.

On the multi-host scalability side we've made some changes to both XenServer and Apache CloudStack (the foundation of the Citrix CloudPlatform cloud orchestration product) to reduce the load on the pool master and therefore make it possible to use the maximum resource pool size. For the longer term we're evaluating the overlap between XenServer's pool-based clustering and the various forms of host aggregation offered by orchestration stacks such as CloudStack and OpenStack. With the orchestration stacks' ability to manage a large number of hosts do we really need to indirect all XenServer commands through a pool master?

Disaggregation has taken place in the Xen Project XAPI toolstack used in XenServer. A prerequisite to moving the xapi daemon into a service VM was to split the higher level clustering and policy part of the daemon from the low level VM lifecycle management and hypervisor interface. From XenServer 6.1 the latter function was split into a separate daemon called xenopsd with the original xapi daemon performing the clustering and policy functions. In the network management part of the stack a similar split has been made to separate the network control function into xcp-networkd - this created immediate value by having a better defined internal API but is also a prerequisite for network driver domains. The current development version of the XAPI project has had a number of other modularity clean-ups including various services being split into separate daemons with better build and packaging separation.

b2ap3_thumbnail_demu.jpgWe're also using intra-component disaggregation for XenServer's virtual GPU (vGPU) support. A "discrete" emulator (DEMU) is used to provide the glue to allow the GPU vendor's control plane multiplexer driver in domain0 to service the control path parts of the vGPU access from the guest VM. This is done by, in effect, disaggregating qemu and having the DEMU take ownership of the I/O ports associated with the device it is emulating. This mechanism is now being added the the Xen Project to allow other virtual devices to be handled by discrete emulators, perhaps even in separate domains. Eventually we'd like to put the DEMUs and GPU driver into a driver domain to decouple the maintenance (particular required kernel version) of domain-0 and the GPU driver.

I view Windsor like a concept car, a way to try out new ideas and get feedback on their value and desirability. Like a concept car some of Windsor's ideas have made it into the shipping XenServer releases, some are coming, some are on the wishlist and some will never happen. Having a forward looking technology pipeline helps us to ensure that we keep evolving XenServer to meet users' needs both now and in the future.

Recent Comments
Tobias Kreidl
@James C: You can do that with Linux VMs -- and have been able to for years, it's just not supported by Citrix. We have run some s... Read More
Saturday, 03 May 2014 15:42
James Bulpin
We are looking at using the kernel mode block backend (blkback) for raw block devices, such as LVM (not LVHD) on SSDs and raw SAN ... Read More
Wednesday, 07 May 2014 16:47
Tobias Kreidl
We have done this for years on Linux boxes with directly-attached iSCSI connections, with significant performance gains. I wrote a... Read More
Saturday, 03 May 2014 16:10
Continue reading
22366 Hits
7 Comments

XenServer and the OpenSSL Heartbleed Vulnerability

On April 7th, 2014 a security vulnerability in the OpenSSL library was disclosed, and was given the monkier of "HeartBleed".  This vulnerability has received a ton of press, and there is a very nice summary of what this all means on heartbleed.com.  Since XenServer includes the OpenSSL libraries, there was the potential it could be impacted as well.  The good news for anyone using a released version of XenServer is that all supported versions of XenServer use version 0.9.8.  So if you have XenServer in production, you can have confidence in that XenServer deployment.  

Of course, since XenServer is open source, there are other ways to deploy XenServer than using a released version.  The first is to either build from sources or to take xenserver-core and install it on your preferred Linux distribution.  If that was your path to creating a XenServer deployment, then you will need to double check if your dom0 distribution is at risk.  The second way would be to install XenServer from a nightly snapshot.  The bad news is that these nightly snapshots do include a vulnerable version of OpenSSL, but we're working on it.  Now of course those snapshots aren't considered production ready, and aren't eligible for support from Citrix, but we all know they could be in labs someplace and still should be checked.

If you're using XenServer as part of a CloudStack deployment, the good folks over at ShapeBlue have put together a blog describing the steps you should follow to mitigate the risk in Apache CloudStack 4.0 through 4.3.  A similar checklist exists for OpenStack deployments, and regardless of your chosen cloud orchestration solution if you have deployed XenServer from released binaries XenServer doesn't contain a vulnerable version of OpenSSL.

 

Tags:
Recent comment in this post
Paul Calabro
Nice article! I'm now trying to address the newest set of OpenSSL issues that have recently surfaced regarding MITM. This artic... Read More
Tuesday, 10 June 2014 20:40
Continue reading
16392 Hits
1 Comment

Patching XenServer at Scale

In January, I posted a how-to guide covering the installation of XenServer in a large scale environment, and this month we're going to talk about patching XenServer in a similar environment. Patching any operating environment is an important aspect of running a production installation, and XenServer is no different. Patching a XenServer manually can be done in one of two ways; either through XenCenter and its rolling pool upgrade option or via the CLI. The rolling pool upgrade wizard has been available since XenServer 6.0, and not only applies hotfixes to all the servers in a pool in the correct order, but also ensures any running VMs are migrated if reboots are required. If you prefer to apply the patches using the CLI, it becomes your responsibility to perform the VM migration, but the process is quite simple. XenServer customers with a Citrix support contract can utilize the rolling pool upgrade wizard, while free users have the option of manually patching using the CLI. Of course these two options can be used in a large scale environment, but generally the requirement is to script everything, and that's where this blog comes in.

Assumptions

The core assumption in the script in this blog is that the XenServer hosts are not in a pool. If the hosts are in a pool, then you should apply patches to the pool master first, and then any slaves. Since we're building on the environment in my previous blog which had standalone hosts, this assumption is valid.

Preparation Steps

  1. Download the desired hotfixes, patches and service packs from either citrix.com (http://support.citrix.com/product/xens/v6.2.0/) or xenserver.org (http://xenserver.org/overview-xenserver-open-source-virtualization/download.html)
  2. Extract the xsupdate file of each patch into a directory on an NFS share
  3. Test each patch to verify it works in your environment. While not required, I always like to do this because QA can't know every possible configuration and bugs do happen.
  4. Create a file named manifestand place it in the same directory as the xsupdate files. The manifest file will contain a single line for each patch, and those patches will be processed in order. An example manifest file is provided below, and any given line can be commented out using the hash (#) character.
    XS62E001.xsupdate
    XS62E002.xsupdate
    XS62E004.xsupdate
    XS62E005.xsupdate
    XS62E009.xsupdate
    XS62E010.xsupdate
    XS62E011.xsupdate
    XS62E012.xsupdate
    XS62ESP1.xsupdate
  5. Create a script file named apply-patches.shand place it in a known location. The contents of the script will be
    #!/bin/sh 
    # apply all XenServer patches which have been approved in our manifest
    
    mkdir /mnt/xshotfixes
    mount 192.168.98.3:/vol/exports/isolibrary/xs-hotfixes /mnt/xshotfixes
    
    
    HOSTNAME=$(hostname)
    HOSTUUID=$(xe host-list name-label=$HOSTNAME --minimal)
    while read PATCH
    do 
    if [ "$(echo "$PATCH" | head -c 1)" != '#' ]
    then 
    	PATCHNAME=$(echo "$PATCH" | awk -F: '{ split($1,a,"."); printf ("%s\n", a[1]); }')
    	echo "Processing $PATCHNAME"
    	PATCHUUID=$(xe patch-list name-label=$PATCHNAME hosts=$HOSTUUID --minimal)
    	if [ -z "$PATCHUUID" ]
    	then
    		echo "Patch not yet applied; applying .."
    		PATCHUUID=$(xe patch-upload file-name=/mnt/xshotfixes/$PATCH)
    		if [ -z "$PATCHUUID" ] #empty uuid means patch uploaded, but not applied to this host
    		then
    			PATCHUUID=$(xe patch-list name-label=$PATCHNAME --minimal)
    		fi
    		#apply the patch to *this* host only
    		xe patch-apply uuid=$PATCHUUID host-uuid=$HOSTUUID
    
    		# remove the patch files to avoid running out of disk space in the future
    		xe patch-clean uuid=$PATCHUUID 
    		
    		#figure out what the patch needs to be fully applied and then do it
    		PATCHACTIVITY=$(xe patch-list name-label=$PATCHNAME params=after-apply-guidance | sed -n 's/.*: \([.]*\)/\1/p')
    		if [ "$PATCHACTIVITY" == 'restartXAPI' ]
    		then
    			xe-toolstack-restart
    			# give time for the toolstack to restart before processing any more patches
    			sleep 60
    		elif [ "$PATCHACTIVITY" == 'restartHost' ]
    		then
    			# we need to rebot, but we may not be done.
    			# need to create a link to our script
    			
    			# first find out if we're already being run from a reboot
    			MYNAME="`basename \"$0\"`"
    			if [ "$MYNAME" == 'apply-patches.sh' ]
    			then
    				# I'm the base path so copy myself to the correct location
    				cp "$0" /etc/rc3.d/S99zzzzapplypatches  
    			fi
    			
    			reboot
    			exit
    		fi
    		
    	else
    		echo "$PATCHNAME already applied"
    	fi
    	
    fi
    done < "/mnt/xshotfixes/manifest"
    
    echo "done"
    umount /mnt/xshotfixes
    rmdir /mnt/xshotfixes
    
    # lastly if I'm running as part of a reboot; kill the link
    rm -f /etc/rc3.d/S99zzzzapplypatches 

Applying Patches

Applying patches is as simple as running the script file and letting it do what it needs to do. Here's how it works...

  1. We need to find out if the patch has already been applied.
  2. If the patch hasn't been applied, we want to upload it and then apply it. Since any given patch might require the toolstack to be restarted, we check for that and restart the toolstack. Additionally we need to handle the case where the patch might require a reboot. If that's the case, we want to reboot, but also might need to process additional patches. To account for that, we'll insert ourself into the reboot sequence to keep processing more patches until we've reached the end.
  3. Since we want to be sensitive to disk space usage, we'll cleanup the patch files once each patch has been applied.

 

This script becomes quite valuable when used in conjunction with the provisioning script in my blog on installing XenServer at scale. Simply copy the patch script to /etc/rc3.d/S99zzzzapplypatches and add that command to first-boot-script.sh prior to the final reboot. With the combination of these two scripts, you now can install XenServer at scale, and ensure those newly installed XenServer hosts are fully patched from the beginning.     

Recent Comments
Tim Mackey
Thanks Matthew. The HTML tidy code in the blog editor ate it.
Wednesday, 19 March 2014 15:48
Anthony
Hi Tim Mackey, Thanks a great deal for you hard work, just the script I've been looking for. I'm a newby in bash shell scrip... Read More
Wednesday, 26 March 2014 17:24
Tim Mackey
Anthony, For this script the only thing you'd want to change is the IP address of the NFS server and the export point where you p... Read More
Tuesday, 01 April 2014 14:32
Continue reading
46794 Hits
27 Comments

XenServer Status – January 2014

The release of true hardware GPU sharing and XenServer 6.2 SP1 was a strong finish to 2013 and based on the feedback from the Citrix Partner Summit a few weeks back, we really are a key differentiator for Citrix XenDesktop which fulfills one of the roles XenServer has in the ecosystem. Of course this also opens the question of how to get the sources for the cool new bits, and I’ll cover that in just a little bit. Looking beyond the commercial role for XenServer, we also saw significant growth in the core project with visitors, page views, downloads and mailing list activity all up at least 20% compared to December. From the perspective of engineering accomplishments, completed work in Q4 included a move to 4.3 for the Xen Project hypervisor, a move to CentOS 6.4 with Linux kernel to 3.10, and significant work towards a 64 bit dom0, upstream support for Windows PV drivers and blktap3 for storage. All told, this is a fantastic base from which to build upon in 2014.

Speaking of foundations, as an open source project we have an obligation to our community to provide clear access to the source used to produce XenServer. Unfortunately, it’s become apparent some confusion exists in the state of the project and source code locations. Fundamentally we had a miscommunication where many assumed the sources on xenserver.org and posted in GitHub represented XenServer 6.2 and that code changes which occurred in the GitHub repositories represented the XenServer 6.2 product direction. In reality, XenServer 6.2 represents a fork of XenServer which occurred prior to the creation of xenserver.org and the code which is part of xenserver.org represents trunk. So what does this mean for those of you looking for code, and for that matter test your solution with the correct binaries? To solve that I’ve created this handy little table:

XenServer 6.1 and prior Source is located on citrix.com within the downloads section
XenServer 6.2 Source is located on citrix.com within the downloads section, and on xenserver.org download page
XenServer 6.2 hotfixes Source is located within the zip file containing the hotfix
XenServer 6.2 SP1 Source is located within the zip file containing the service pack
XenServer trunk Source is located in the XenServer GitHub repository
XenServer nightly builds Source is located in the XenServer GitHub repository
XenCenter 6.1 and prior Source is not available
XenCenter 6.2 and later Source is located in the XenServer GitHub repository, and all XenCenter 6.2 versions are built from trunk
XenServer optional components Not all optional components are open source. For components which are open source, the source will be available with the component. Note that source code from third parties may require a license from the third party to obtain source (e.g. proprietary headers)

 

So what does this mean for specific feature work, and more importantly the next major version of XenServer? If the work being performed occurs within the XenServer 6.2 branch (for example as a hotfix), then that work will continue to be performed as it always has and source will be posted with that release. The only exception to that is work on XenCenter which is always occurring in trunk. Work for the next major release will occur in trunk as it currently has, but specific feature implementations in trunk shouldn't be considered "ready" until we actually release. In practice that means we may have some proof of concept stuff get committed, and we may decide that proof of concept work isn't compatible with newer work and refactor things before the release. I hope this clears things up a little, and there is now a better understanding of where a given feature can be found.     

Recent Comments
GizmoChicken
Tim, you mention the move to Xen 4.3, the move to CentOS 6.4 with Linux kernel to 3.10, and the significant work towards a 64-bit ... Read More
Sunday, 16 February 2014 06:52
Kristoffer Sheather
Can you provide a roadmap / projected schedule for the next releases of XenServer?
Monday, 03 March 2014 00:49
Continue reading
17207 Hits
3 Comments

XenServer Driver Disks

Why do we need them

A matter of importance for any OS to consider, is how to support hardware. One part of that support is going to be the management of device drivers used to talk to network cards, storage controllers and other such devices.

For each XenServer release, Citrix works with hardware vendors to try and get the 'latest and greatest' (and obviously stable!) drivers for supporting upcoming hardware inbox - which is for the most part great. Customers installing XenServer can successfully install XenServer on their new hardware, and they don't need to think about drivers at all.

This is all good and well except for the fact that sometimes release schedules don't align so perfectly, and actually several months after a release an OEM may start shipping a new hardware that requires updated drivers. Or perhaps a customer discovers a issue with their system which is caused by a bug in the inbox device driver.

In either case, we need a mechanism to be able to provide the customer with a new driver: enter driver disks.

What are they

A 'Driver Disk' is a particular instance of a 'Supplemental Pack' which is the mechanism that XenServer uses for supporting the installation of RPMs in Dom0.

The format of a driver disk is simply a collection of RPMs, along with some meta-data regarding the packs contents and pack dependencies (obviously RPMs include their own dependency metadata too).

These Driver disks are built by hardware vendors using the 'Driver Development Kit' (DDK) which contains a Dom0 kernel for building drivers against.

The vendor simply:

  1. Creates a Makefile/specfile for generating a driver RPM.
  2. Uses build-supplemental-pack.sh to compile the driver RPM, and build an ISO with the appropriate metadata.

After which the vendor can use the ISO to install the new drivers on an appropriate XenServer version. Once hardware has been certified (using our self-certification kits) with the new drivers, the vendor can go ahead and submit the drivers to Citrix who will make them available for customers to download from support.citrix.com.

Respins

The astute among you, who are familiar with the Citrix support pages may have noticed that we don't just release a single driver disk for any given XenServer release, we actually release multiple.

The reason for this is that each driver is built against a specific kernel Application Binary Interface 'ABI' - which has a particular version number. When this ABI version is bumped, because of modifications to the kernel in a kernel hotfix, the new kernel will no longer load a driver module built against the old ABI.

This is why customers might encounter the following scenario:

  1. Customer installs XenServer 6.2 with driver foo x.y.z
  2. Customer installs a driver disk for foo x.y.z+1
  3. Customer installs a kernel hotfix
  4. Customer notices that XenServer is now using foo x.y.z (!)

In this scenario, because the customer has installed a kernel hotfix on 6.2, without installing a re-spun driver disk for foo x.y.z+1, the new kernel, shipped with the kernel hotfix, will load the driver included in the GA version of 6.2. This is in order to prevent customers from having to take driver updates with kernel hotfixes.

So for each kernel hotfix, Citrix will re-spin driver disks (previously released for that product version), and post them on support.citrix.com for customers to install. This gives customers maximum flexibility in being able to choose which kernel hotfixes/drivers they want to take, independently from one another.

More info

If you want to find out more about the packaging of driver disks, or in fact how to build supplemental packs - then please checkout the official documentation here:

 

http://docs.vmd.citrix.com/XenServer/6.2.0/1.0/en_gb/supplemental_pack_ddk.html     

Recent Comments
Robert
I'm wondering why there are not real word examples on how to build/create XenServer Driver Disks for a fresh installation. As exa... Read More
Tuesday, 30 September 2014 10:11
Continue reading
25501 Hits
2 Comments

How-to: Installing XenServer at Scale

Once upon a time, in a time far, far away (don’t most good stories start this way?) XenServer was so easy to get installed and running that we promoted it as “Ten Minutes to Xen”.  While this is still often the case for small installations, even ten minutes can be problematic for some, and even more so when hundreds of hosts are involved.  In this article, we’ll expand upon the XenServer Quick Installation Guide and show how you can scale out your XenServer environment quickly using a scripting model, and ensure you have correct monitoring and logging in place by default. 

Assumptions

This article assumes you’ve already installed XenServer on one server and validated that no additional drivers are required.  It also assumes that you’ve configured you server BIOS to be identical across all servers, and that PXE is supported on the NIC used as the management network.  One key item in the preparation is that the servers are set to boot in legacy BIOS mode and not UEFI.

Preparation steps

1.       Download the XenServer installation ISO media: http://xenserver.org/open-source-virtualization-download.html

2.       Extract the entire contents of XenServer installation ISO file to either a HTTP, FTP or NFS location (in this example we’ll be using NFS)

3.       Collect the following information

Hostname: xenserver
Root password: password
Keyboard locale: us
NTP server address: 0.us.pool.ntp.org
DNS server address: dns.local
Time zone:  America/New_York (supported time zones in RHEL)
Location of extracted ISO file: nfsserver:/
TFTP server IP address: pxehost

Configuring TFTP server to supply XenServer installer

1.       In the /tftpboot directory create a new directory called xenserver

2.       Copy the mboot.c32 and pxelinux.0 files from the /boot/pxelinux directory of the XenServer ISO file to the /tftpboot directory

3.       Copy the install.img file from the root directory of the XenServer ISO file to the /tftpboot/xenserver directory

4.       Copy the vmlinuz and zen.gz files from the /boot directory of the XenServer ISO file to the /tftpboot/xenserver directory

5.       In the /tftpboot directory, create a new directory called pxelinux.cfg

The above steps are covered in this script: 

mkdir /mnt/xsinstall
mount [XenServer ISO Extract Location] /mnt/xsinstall
cd ./tftpboot
mkdir xenserver
cp /mnt/xsinstall/boot/pxelinux/mboot.c32 ./
cp /mnt/xsinstall/boot/pxelinux/pxelinux.0 ./
cp /mnt/xsinstall/install.img ./xenserver
cp /mnt/xsinstall/boot/vmlinuz ./xenserver 
cp /mnt/xsinstall/boot/zen.gz./xenserver 

6.       In the /tftpboot/pxelinux.cfg directory create a new configuration file called default

7.       Edit the default file to contain the following information.  Note that this command includes remote logging to a SYSLOG server.

default xenserver-auto
label xenserver-auto
	kernel mboot.c32
	append xenserver/xen.gz dom0_max_vcpus=1-2 dom0_mem=752M,max:752M com1=115200,8n1 console=com1,vga --- xenserver/vmlinuz xencons=hvc console=hvc0 console=tty0 answerfile=http://[pxehost]/answerfile.xml remotelog=[SYSLOG] install --- xenserver/install.img 

8.       Unattended installation of XenServer requires an answer file.  Place the answer file in the root directory of your NFS server.  Please note that there are many more options than are listed here, but this will suffice for most installations.

  
<?xml version="1.0"?> <installation mode="fresh" srtype="lvm"> <bootloader>extlinux</bootloader> <primary-disk gueststorage="yes">sda</primary-disk> <keymap>[keyboardmap]</keymap> <hostname>[hostname]</hostname> <root-password>[password]</root-password> <source type="nfs">[XenServer ISO Extract Location]</source> <admin-interface name="eth0" proto="dhcp"/> <name-server>dns.local</name-server> <timezone>[Time zone]</timezone> <time-config-method>ntp</time-config-method> <ntp-server>[NTP Server Address]</ntp-server> <script stage="filesystem-populated" type="nfs">[XenServer ISO Extract Location]/post-install-script.sh</script> </installation>

Configuring the post installation scripts 

1.       In the root directory of the XenServer ISO extract location, create a file named post-install-script.sh with the following contents.  This script will run after a successful installation, and copies a first boot script for post installation configuration.

 

#!/bin/sh
touch $1/tmp/post-executed
mkdir $1/mnt/xsinstall
mount [XenServer ISO Extract Location] $1/mnt/xsinstall
cp $1/mnt/xsinstall/first-boot-script.sh $1/var/xen/fbs.sh
chmod 777 $1/var/xen/fbs.sh
ln -s /var/xen/fbs.sh $1/etc/rc3.d/S99zzzzpostinstall

2.       In the root directory of the XenServer ISO extract location, create a file named first-boot-script.sh with whatever steps you need to configure XenServer for your environment.  In the script below, we take care of the following cases:

a.       Assign a unique, human understandable hostname based on the assigned IP address

b.      Configure a dedicated storage network which uses Jumbo frames

c.       Configure centralized logging using SYSLOG

d.      Configure network monitoring using NetFlow

e.      Apply a socket based license

f.        Remove the first script to ensure it doesn’t run on subsequent reboots

 

#!/bin/bash
# Wait before start
sleep 60
 
# Get current hostname which then gets us the host-uuid
HOSTNAME=$(hostname)
HOSTUUID=$(xe host-list name-label=$HOSTNAME --minimal)
 
# Get the management pif UUID which gets us the IP address
MGMTPIFUUID=$(xe pif-list params=uuid management=true host-name-label=$HOSTNAME --minimal)
MGMTIP=$(xe pif-param-list uuid=$MGMTPIFUUID | grep 'IP '| sed -n 's/.*: ([0-9.]*)/1/p')
 
# From the IP address, get the zone and host
ZONE=$(echo "$MGMTIP" | awk -F: '{ split($1,a,"."); printf ("%dn", a[3]); }')
HOST=$(echo "$MGMTIP" | awk -F: '{ split($1,a,"."); printf ("%dn", a[4]); }')
 
# Configure SYSLOG
xe host-param-set uuid=$HOSTUUID logging:syslog_destination=[SYSLOG]
xe host-syslog-reconfigure host-uuid=$HOSTUUID
 
# Assign License to server
xe host-apply-edition edition=per-socket host-uuid=$HOSTUUID license-server-address=[LicenseServer] license-server-port=27000
 
# Setup storage network. For us, that’s on eth1 (aka xenbr1)
STORAGEPIFUUID=$(xe pif-list params=uuid host-name-label=$HOSTNAME device=eth1 --minimal)
xe pif-reconfigure-ip mode=static uuid=$STORAGEPIFUUID ip=192.168.$ZONE.$HOST netmask=255.255.255.0
xe pif-param-set disallow-unplug=true uuid=$STORAGEPIFUUID
xe pif-param-set other-config:management_purpose="Storage" uuid=$STORAGEPIFUUID
NETWORKUUID=$(xe network-list params=uuid bridge=xenbr1 –minimal)
xe network-param-set uuid=$NETWORKUUID MTU=9000
 
# Setup NetFlow monitoring on the 4 network bridges in our hosts
ovs-vsctl -- set Bridge xenbr0 netflow=@nf -- --id=@nf create NetFlow targets=\"192.168.0.34:5566\" active-timeout=30
ovs-vsctl -- set Bridge xenbr1 netflow=@nf -- --id=@nf create NetFlow targets=\"192.168.0.34:5566\" active-timeout=30
ovs-vsctl -- set Bridge xenbr2 netflow=@nf -- --id=@nf create NetFlow targets=\"192.168.0.34:5566\" active-timeout=30
ovs-vsctl -- set Bridge xenbr3 netflow=@nf -- --id=@nf create NetFlow targets=\"192.168.0.34:5566\" active-timeout=30
 
# Rename host in both XenServer and for XenCenter
NEWHOSTNAME=$(echo $HOSTNAME$ZONE-$HOST)
xe host-set-hostname-live host-uuid=$HOSTUUID host-name="$NEWHOSTNAME "
xe host-param-set uuid=$HOSTUUID name-label="$NEWHOSTNAME"
# Disable first boot script for subsequent reboots rm -f /etc/rc3.d/S99zzzzpostinstall # Final Reboot reboot

Configuring the network

There are several considerations we need to account for in our network design. 

1.       The XenServer management networks cannot be tagged within XenServer.  To work around this, the network ports will need to have a default VLAN assigned to them. 

2.       The storage management network is using jumbo frames and will need an MTU of 9000

3.       The TFTP server will need to be on the primary management network

4.       Since we will want to have persistent control over the XenServer hosts and their VMs, we will want to have each XenServer use a static address.  In order to accomplish with DHCP, we’ll need to configure our DHCP service to use static MAC address reservations.  A sample dhcpd.conf is provided below:

authoritative;
dns-update-style interim;
default-lease-time 28800;
max-lease-time 28800;
 
        option routers                  10.10.2.1;
        option broadcast-address        10.10.2.255;
        option subnet-mask              255.255.255.0;
        option domain-name-servers      10.10.2.2, 10.10.2.3;
        option domain-name              "xspool.local";
 
        subnet 10.10.2.0 netmask 255.255.255.0 {
             pool {
                range 10.10.2.50 10.10.2.250;
 
# one host entry following our naming convention
                host xenserver2-50 {
                  hardware ethernet 00:11:22:33:44:55;
                  fixed-address 10.10.2.50;
                }
                host xenserver2-51 {
                  hardware ethernet 00:11:22:33:44:56;
                  fixed-address 10.10.2.51;
                }
                host xenserver2-52 {
                  hardware ethernet 00:11:22:33:44:57;
                  fixed-address 10.10.2.52;
                }
# prevent unknown hosts from polluting the pool
                deny unknown-clients;
             }

Booting the servers to perform the install

Since our objective is to perform a scale installation using scripting, we also need to script the PXE boot of our servers, and ensure the PXE boot is a first boot only (i.e. we’re not continuously reinstalling on each reboot).  Thankfully remote access cards provide this capability, and I'm currently compiling a set of scripts to cover as many vendors as I can. 

Tying it all together

In this article you've seen how easy it is to deploy a large number of XenServer hosts consistently.  That's not the end of things, and over the coming weeks I'll be posting guides covering many more scale operations with XenServer.

 

Tags:
Recent Comments
roshan
Great article, I am researching how to setup Diskless XenServer booting.. Is it possible to expand this article to include diskl... Read More
Thursday, 23 January 2014 06:20
Tim Mackey
Roshan, I'll add that to the list of blogs I'm working on publishing. One of the key areas I'd need to investigate is the log ac... Read More
Thursday, 23 January 2014 14:44
roshan
Hi Tim That would be fantastic.. I am researching this atm, technical this is possible with linux. Just need to cater for dif... Read More
Thursday, 23 January 2014 22:07
Continue reading
64379 Hits
18 Comments

XenServer: code highlights from 2013

For me, the biggest event of 2013 was undoubtably the open-sourcing of xenserver in JuneBy volume, about 99% of xenserver was already open-source (xen, Linux, CentOS, xapi etc), nevertheless it was great to finally see the code for xencenter and the Windows PV drivers: win-xeniface, win-xennet, win-xenvif, win-xenvbd and even the awesome test system, xenrt finally open-sourced.

Of course, the action certainly didn’t stop there. Not only were the Windows PV drivers open-sourced, but Paul, Ben and Owen completely overhauled them to make them compatible with upstream xen. Previously the drivers relied on a customisation contained within the xenserver patch queue. Now the drivers should work well, on every xen system.

Virtualising graphics... the right way

In another exciting development, Paul's work on creating multiple device emulators for HVM guests enabled safe sharing of physical GPUs among VMs, a feature we call vGPU. Just as xen allows its components to be isolated in separate VM containers (known as dom0 disaggregation), it’s exciting to see the isolation being taken to the level of individual virtual PCI devices. (I’m hoping to try writing my own virtual PCI device sometime in 2014)

User interfaces

Continuing with the Windows theme, at the top of the xenserver stack, the XenCenter interface has received several great usability enhancements. It has been redesigned to simplify the user experience for navigation between different views of resources and for viewing different types of notifications. This was all thanks to the hard work of Tina (expect another blog on this subject soon!)

Scaling up

2013 was also a great year for xenserver scalability. It’s quite a challenge making a system as complex as xenserver scale well: you have to have a deep understanding of the whole system in order to find -- and fix -- all the important bottlenecks. Thanks to the laser-like focus of Felipe, the storage datapath has been extensively analysed and understood. Meanwhile large increases in basic system resources such as David’s new event channel ABI, reducing the number of grant references needed by disabling receive-side copy and absorbing upstream xen goodness such as Wei’s patch to use poll(2) in consoled have led to big improvements in VM density.

XenServer: the distro

The xenserver distro is the foundation upon which everything else is -- literally -- based. Anyone who has downloaded one of the regular development snapshot builds (thanks to Craig and Peter for organising those) should have noticed that it has been recently rebased on top of CentOS 6.4 with a shiny new Linux 3.x kernel and xen 4.3. This means that we have access to new hardware drivers, access to more modern tools (e.g. newer versions of python) and lots of other great stuff.

(No-one likes) patch queues

Speaking of the distro, I have to mention the “patch queue problem”. Patch queues are a sequence of source code customisations applied to an “upstream” (e.g. the official xen-4.3 release) to produce the version we actually use. Patch queues are important tools for distro builders. They can be used for good (e.g. backporting important security fixes) and for evil (e.g. forward-porting stuff that shouldn’t exist: “technical debt” in its most concrete form). Every time a new upstream release comes out, the patch queue needs careful rebasing against the new release -- this can be very time-consuming. In recent years, the xenserver xen patch queue had grown to such a large size that it was almost blocking us from moving xenserver to more recent versions of xen. I’m happy to report that the past year has seen heroic efforts from Andy, Malcolm and David to reduce it to more manageable levels. Andy tells me that while it took more than 1 year (!) to rebase and fix xenserver from xen 3.4 to 4.1; and then -- a still surprising -- 3 months to get from 4.1 to 4.2; it recently only took 3 days to rebase from 4.2 to 4.3! Phew!

Build and packaging

Our goal is to get to a world where the xenserver.iso is simply a respin of a base (CentOS) distro with an extra repo of packages and overrides on top. Therefore in 2013 we made a concerted effort to clean up our xenserver distro build and packaging more generally. Thanks to Euan, Jon and Frediano we're now using standard build tools like mock and rpmbuild. In the past we cut corners by either leaving files unpackaged (bad) or applying large patch queues in the packages (terrible, as we’ve seen already). To help sort this out, Euan created a set of experimental RPM and .deb packages for the toolstack, shook out the bugs and forced us to fix things properly. As a result we’ve found and fixed lots of portability problems in the upstream software (e.g. hard-coded CentOS paths which break on Debian), which should make the lives of other distro package maintainers easier.

As a side-benefit, we’ve also been able to release bleeding-edge packages containing prototypes of new features, such as ceph storage released as a tech preview in July, based on libvirt and Richard Jones' excellent OCaml bindings

New toolstack version

Next on my list, xenserver picked up a refreshed version of xapi with lots of improvements, my personal favourites being Rob's port of xenopsd to libxl; enhanced disk copying APIs tailored for cloud use-cases (thanks to Zheng, Si, Dave); and support for enabling host GRO (thanks again to Rob) and more IPv6 (thanks to both Rob and Euan).

Keen dom0 watchers will notice that “xapi” has split into multiple daemons including a stand-alone host network configuration daemon and a stand-alone statistics collection and monitoring daemon. These services are designed to be usable independently (even without the xapi mothership installed) and, since they use many of the OCaml libraries for high-performance type-safe I/O from the openmirage project, are candidates for running as specialised xen kernels in a fully-disaggregated dom0.

Last, but certainly not least, xenserver gained many, many bug-fixes making it into an even-more robust platform to which you can trust your infrastructure. Working on xenserver in 2013 was really fun and I’m looking forward to (the rest of) 2014!

Tags:
Continue reading
10017 Hits
0 Comments

Project Karcygwins and Virtualised Storage Performance

Introduction

Over the last few years we have witnessed a revolution in terms of storage solutions. Devices capable of achieving millions of Input/Output Operations per Second (IOPS) are now available off-the-shelf. At the same time, Central Processing Unit (CPU) speeds remain largely constant. This means that the overhead of processing storage requests is actually affecting the delivered throughput. In a world of virtualisation, where extra processing is required in order to securely pass requests from virtual machines (VM) to storage domains, this overhead becomes more evident.

It is the first time that such an overhead became a concern. Until recently, the time spent within I/O devices was much longer than that of processing a request within CPUs. Kernel and driver developers were mainly worried about: (1) not blocking while waiting for devices to complete; and (2) sending requests optimised for specific device types. While the former was addressed by techniques such as Direct Memory Access (DMA), the latter was solved by elevator algorithms such as Completely Fair Queueing (CFQ).

Today, with the large adoption of Solid-State Drives (SSD) and the further development of low-latency storage solutions such as those built on top of PCI Express (PCIe) and Non-Volatile Memory (NVM) technologies, the main concern lies in not losing any unnecessary time in processing requests. Within the Xen Project community, some development already started in order to allow scalable storage traffic from several VMs. Linux kernel maintainers and storage manufacturers are also working on similar issues. In the meantime, XenServer Engineering delivered Project Karcygwins which allowed a better understanding of current bottlenecks, when they are evident and what can be done to overcome them. 

Project Karcygwins

Karcygwins was originally intended as three separate projects (Karthes, Cygni and Twins). Due to their topics being closely related, they were merged. Those three projects were proposed based on subjects believed to be affecting virtualised storage throughput.

Project Karthes aimed at assessing and mitigating the cost in mapping (and unmapping) memory between domains. When a VM issues an I/O request, the storage driver domain (dom0 in XenServer) requires access to certain memory areas in the guest domain. After the request is served, these areas need to be released (or unmapped). This is also an expensive operation due to flushes required in different cache tables. Karthes was proposed to investigate the cost related to these operations, how they impacted the delivered throughput and what could be done to mitigate them.

Project Cygni aimed at allowing requests larger than 44 KiB to be passed between a guest and a storage driver domain. Until recently, Xen's blkif protocol defined a fixed array of data segments per request. This array had room for 11 segments corresponding to a 4 KiB memory page each (hence the 44 KiB). The protocol has since been updated to support indirect I/O operations where the segments actually contained other segments. This change allowed for much larger requests at a small expense.

Project Twins aimed at evaluating the benefits of using two communication rings between dom0 and a VM. Currently, only one ring exists and it is used both for requests from the guests and responses from the back end. With two rings, requests and responses can be stored in their own ring. This new strategy allows for larger inflight data and better use of caching.

Due to initial findings, the main focus of Karcygwins stayed on Project Karthes. The code allowing for requests larger than 44 KiB, however, was constantly included in the measurements to address the goals proposed for Project Cygni. The idea of using split rings (Project Twins) was postponed and will be investigated at a later stage.

Visualising the Overhead

When a user installs a virtualisation platform, one of the first questions to be raised is: "what is the performance overhead?". When it comes to storage performance, a straightforward way to quantify this overhead is to measure I/O throughput on a bare metal Linux installation and repeat the measurement (on the same hardware) from a Linux VM. This can promptly be done with a generic tool like dd for a variety of block sizes. It is a simple test that does not cover concurrent workloads or greater IO depths.

karcyg-fig0.png

Looking at the plot above we can see that, on a 7.2k RPM SATA Western Digital Blue WD5000AAKX, read requests as large as 16 KiB can reach the maximum disk throughput at just over 120 MB/s (red line). When repeating the same test from a VM (green and blue lines), however, we see that the throughput for small requests is much lower. They eventually reach the same 120 MB/s mark, but only with larger requests.

The green line represents the data path where blkback is directly plugged to the back end storage. This is the kernel module that receive requests from the VM. While this is the fastest virtualisation path in the Xen world, it lacks certain software-level features such as thin-provisioning, cloning, snapshotting and the capability of migrating guests without centralised storage.

The blue line represents the data path where requests go through tapdisk2. This is a user space application that runs in dom0 and can implement the VHD format. It also has an NBD plugin for migration of guests without centralised storage. It allows for thin-provisioning, cloning and snapshotting of Virtual Disk Images (VDI). Because requests transverse more components before reaching the disk, it is understandingly slower.

Using Solid-State Drives and Fast RAID Arrays

The shape of the plot above is not the same for all types of disks, though. Modern disk setups can achieve considerable higher data rates before flattening their throughputs.

karcyg-fig1.png

Looking at the plot above, we can see a similar test executed from dom0 on two different back end types. The red line represents the throughput obtained from a RAID0 formed by two SSDs (Intel DC S3700). The blue line represents the throughput obtained from a RAID0 formed by two SAS disks (Seagate ST). Both arrays were measured independently and are connected to the host through a PERC H700 controller. While the Seagate SAS array achieves its maximum throughput at around 370 MB/s when using 48 KiB requests, the Intel SSD array continues to speed up even with requests as large as 4 MiB. Focusing on each array separately, it is possible to compare these dom0 measurements with measurements obtained from a VM. The plot below isolates the Seagate SAS array.

karcyg-fig2.png

Similar to what is observed on the measurements taken on a single Western Digital, the throughput measured from a VM is smaller than that of dom0 when requests are not big enough. In this case, the blkback data path (the pink line) allows the VM to reach the same throughput offered by the array (370 MB/s) with requests larger than 116 KiB. The other data paths (orange, cyan and brown lines) represent user space alternatives that reach different bottlenecks and even with large requests cannot match the throughput measured from dom0.

It is interesting to observe that some user space implementations vary considerably in terms of performance. When using qdisk as the back end along the blkfront driver from the Linux Kernel 3.11.0 (the orange line), the throughput is higher for requests of sizes such as 256 KiB (when compared to other user space alternatives -- the blkback data path remains faster). The main difference in this particular setup is the support for persistent grants. This technique, implemented in 3.11.0, reuses memory grants and drastically reduces the map and unmap operations. It requires, however, an additional copy operation within the guest. The trade-off may have different implications when varying factors such as hardware architecture and workload types. More on that on the next section.

karcyg-fig3.png

When repeating these measurements on the Intel SSD array, a new issue came to light. Because the array delivers higher throughput with no signs of abating as larger requests are issued, none of the virtualisation technologies are capable of matching the throughput measured from dom0. While this behaviour will probably differ with other workloads, this is what has been observed when using a single I/O thread with queue depth set to one. In a nutshell, 2 MiB read requests from dom0 achieves 900 MB/s worth of throughput while a similar measurement from one VM will only reach 300 MB/s when using user space back ends. This is a pathological example chosen for this particular hardware architecture to show how bad things can get.

Understanding the Overhead

In order to understand why the overhead is so evident in some cases, it is necessary to take a step back. The measurements taken on slower disks show that all virtualisation technologies are somewhat slower than what is observed in dom0. On such disks, this difference disappears as requests grow in size. What happens at that point is that the actual disk becomes "maxed out" and cannot respond faster no matter the request size. At the same time, much of the work done at the virtualisation layers do not get slower proportionally to the amount of data associated with requests. For example, interruptions between domains are unlikely to take longer simply because requests are bigger. This is exactly why there is no visible overhead with large enough requests on certain disks.

However, the question remains: what is consuming CPU time and causing such a visible overhead on the example previously presented? There are mainly two techniques that can be used to answer that question: profiling and tracing. Profiling allows instruction pointer samples to be collected at every so many events. The analysis of millions of such samples reveals code in hot paths where time is being spent. Tracing, on the other hand, measures the exact time passed between two events.

For this particular analysis, the tracing technique and the blkback data path have been chosen. To measure the amount of time spent between events, the code was actually modified and several RDTSC instructions have been inserted. These instructions read the Time Stamp Counters (TSC) and are relatively cheap while providing very accurate data. On modern hardware, TSCs are constant and consistent across cores of a host. This means that measurements from different domains (i.e. dom0 and guests) can be matched to obtain the time passed, for example, between blkfront kicking blkback. The diagram below shows where trace points have been inserted.

karcyg-fig4.png

In order to gather meaningful results, 100 requests have been issued in succession. Domains have been pinned to the same NUMA node in the host and turbo capabilities were disabled. The TSC readings were collected for each request and analysed both individually and as an average. The individual analysis revealed interesting findings such as a "warm up" period where the first requests are always slower. This was attributed to caching and scheduling effects. It also showed that some requests were randomly faster than others in certain parts of the path. This was attributed to CPU affinity. For the average analysis, the 20 fastest and slowest requests were initially discarded. This produced more stable and reproducible results. The plots below show these results.

karcyg-fig5.png

karcyg-fig6.png

Without persistent grants, the cost of mapping and unmapping memory across domains is clearly a significant factor as requests grow in size. With persistent grants, the extra copy on the front end adds up and results in a slower overall path. Roger Pau Monne, however, showed that persistent grants can improve aggregate throughput from multiple VMs as it reduces contention on the grant tables. Matt Wilson, following on from discussions on the Xen Developer Summit 2013, produced patches that should also assist grant table contention.

Conclusions and Next Steps

In summary, Project Karcygwins allowed the understanding of several key elements in storage performance for both Xen and XenServer:

  • The time spent in processing requests (in CPU) definitely matters as disks get faster
  • Throughput is visibly affected for single-threaded I/O on low-latency storage
  • Kernel-only data paths can be significantly faster
  • The cost of mapping (and unmapping) grants is the most significant bottleneck at this time

It also raised the attention on such issues with the Linux and Xen Project communities by having these results shared over a series of presentations and discussions:

Next, new research projects are scheduled (or already underway) to:

  • Look into new ideas for low-latency virtualised storage
  • Investigate bottlenecks and alternatives for aggregate workloads
  • Reduce the overall CPU utilisation of processing requests in user space

Have a happy 2014 and thanks for reading!

Recent Comments
Lorscheider Santiago
Congratulations for the excellent article and for his work. Interesting to know that even with SSD storage, there is a limitation ... Read More
Thursday, 02 January 2014 13:16
Felipe Franciosi
Hi Santiago. SSDs can be great. The tests I wrote about are all using a single VM and this is the hardest case to deliver near bar... Read More
Tuesday, 07 January 2014 20:11
Lorscheider Santiago
Hi Felipe. Thanks for the replies. Indeed you have a great work to do. Very good to know the resources that can be inserted in Ke... Read More
Wednesday, 08 January 2014 00:30
Continue reading
18852 Hits
12 Comments

XenServer Status – November 2013

The progress towards fulfilling the goal of making XenServer a proper open source project continues, but this month much of the work isn’t visible yet.  The big process improvements will hopefully be unveiled in late December or early January when we get our long needed wiki and defect trackers online.  The logical question of course is why it’s taking so long to get them out there.  After all we obviously do have the content, so why not just make it all public and be done?  Unfortunately, there is no magic wand to remove customer sensitive information, ensure that designs linked to closed source development on other Citrix products, or information provided to Citrix by partners under NDA isn’t accidentally made public.  Its painstaking work and we want to get it right.

In terms of partner announcements, we’ve been focusing on the NVIDIA vGPU work, as well as security efforts.

-          “Kaspersky trusted status” awarded to XenServer Windows Tools: http://blogs.citrix.com/2013/11/14/citrix-xenserver-windows-tools-awarded-kaspersky-trusted-status-plus-a-security-ecosystem-update/

-          SAP 3D Enterprise on XenDesktop on XenServer powered by NVIDIA GRID: http://blogs.citrix.com/2013/11/15/vgpu-sap-3d-visual-enterprise-the-potential-for-mobile-cadplm-xendesktop-on-xenserver-powered-by-nvidia-grid/

-          The XenServer HCL has been expanded to include new servers from HP, Hitachi, Supermicro, Huawei, Lenovo and Fujitsu, storage devices from QNAP, Nexsan and Hitachi Data Systems, storage adapters from IBM and QLogic plus two CNAs from Emulex.

When I posted the project status last month, we had some significant gains, and this month is no different.  Compared to October:

-          Unique visitors were up 30% to 34,000

-          xenserver.org page views were up 21% to over 110,000

-          Downloads of the XenServer installer were up by 7,000

-          We had over 110 commits to the XenServer repositories.

What’s most interesting about these stats isn’t the growth, which I do love, but that we’re getting to a point where the activity level is starting to feel right for a project of our maturity.  Don’t get me wrong, I still am looking for lots more growth, but I’m also looking for sustained activity.  That’s why I’m looking more at how XenServer interacts with its community, and what can be done to improve the relationship.  In my Open@Citrix blog, I asked the question “What kind of community do you want?”.  In my mind, everyone has a voice; it’s just up to you to engage with us.  I’d like to hear what you want from us, and that’s both the good and the bad.  If you have a community you’d like us to be involved with, I’d also like to hear about that too. 

Here is how I define the XenServer community:

The XenServer community is an independent group working to common purpose with a goal of leveraging each other to maximize the success of the community.  Members are proud to be associated with the community.

 

We all have a role to play in the future success of XenServer, and while I have the twitter handle of @XenServerArmy, I view my role as supporting you.  If there is something which is preventing you from adopting XenServer, or being as successful with XenServer as you intended, I want to know.  I want to remove as many barriers to adopting XenServer as I can, and I am your voice within the XenServer team at Citrix.  Please be vocal in your support, and vocal with what you need.

Recent Comments
srinivas j
XenServer status? Please post or update the status of XenServer roadmap
Sunday, 05 January 2014 03:31
srinivas j
currently hold XenServer licenses for 10+ hosts and are eagerly waiting for any updates to XenServer roadmap..
Sunday, 05 January 2014 03:32
Continue reading
12081 Hits
2 Comments

About XenServer

XenServer is the leading open source virtualization platform, powered by the Xen Project hypervisor and the XAPI toolstack. It is used in the world's largest clouds and enterprises.
 
Technical support for XenServer is available from Citrix.