Virtualization Blog

Discussions and observations on virtualization.

Better Together: PVS and XenServer!

XenServer adds new functionality to further simplify and enhance the secure and on-demand delivery of applications and desktops to enterprise environments.

If you haven't visited the Citrix blogs recently, we encourage you to visit https://www.citrix.com/blogs/2016/10/31/pvs-and-xenserver-better-together/ to read about the latest integration efforts between PVS and XS.

If you're a Citrix customer, this article is a must read!

Andy Melmed, Senior Solutions Architect, XenServer PM

 

Continue reading
5513 Hits
0 Comments

XenServer 7.0 performance improvements part 4: Aggregate I/O throughput improvements

The XenServer team has made a number of significant performance and scalability improvements in the XenServer 7.0 release. This is the fourth in a series of articles that will describe the principal improvements. For the previous ones, see:

  1. http://xenserver.org/blog/entry/dundee-tapdisk3-polling.html
  2. http://xenserver.org/blog/entry/dundee-networking-multi-queue.html
  3. http://xenserver.org/blog/entry/dundee-parallel-vbd-operations.html

In this article we return to the theme of I/O throughput. Specifically, we focus on improvements to the total throughput achieved by a number of VMs performing I/O concurrently. Measurements show that XenServer 7.0 enjoys aggregate network throughput over three times faster than XenServer 6.5, and also has an improvement to aggregate storage throughput.

What limits aggregate I/O throughput?

When a number of VMs are performing I/O concurrently, the total throughput that can be achieved is often limited by dom0 becoming fully busy, meaning it cannot do any additional work per unit time. The I/O backends (netback for network I/O and tapdisk3 for storage I/O) together consume 100% of available dom0 CPU time.

How can this limit be overcome?

Whenever there is a CPU bottleneck like this, there are two possible approaches to improving the performance:

  1. Reduce the amount of CPU time required to perform I/O.
  2. Increase the processing capacity of dom0, by giving it more vCPUs.

Surely approach 2 is easy and will give a quick win...? Intuitively, we might expect the total throughput to increase proportionally with the number of dom0 vCPUs.

Unfortunately it's not as straightforward as that. The following graph shows what happened to the aggregate network throughput on XenServer 6.5 if the number of dom0 vCPUs is artificially increased. (In this case, we are measuring the total network throughput of 40 VMs communicating amongst themselves on a single Dell R730 host.)

b2ap3_thumbnail_5179.png

Counter-intuitively, the aggregate throughput decreases as we add more processing power to dom0! (This explains why the default was at most 8 vCPUs in XenServer 6.5.)

So is there no hope for giving dom0 more processing power...?

The explanation for the degradation in performance is that certain operations run more slowly when there are more vCPUs present. In order to make dom0 work better with more vCPUs, we needed to understand what those operations are, and whether they can be made to scale better.

Three such areas of poor scalability were discovered deep in the innards of Xen by Malcolm Crossley and David Vrabel, and improvements were made for each:

  1. Maptrack lock contention – improved by http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=dff515dfeac4c1c13422a128c558ac21ddc6c8db
  2. Grant-table lock contention – improved by http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=b4650e9a96d78b87ccf7deb4f74733ccfcc64db5
  3. TLB flush on grant-unmap – improved by https://github.com/xenserver/xen-4.6.pg/blob/master/master/avoid-gnt-unmap-tlb-flush-if-not-accessed.patch

The result of improving these areas is dramatic – see the green line in the following graph:

b2ap3_thumbnail_4190.png

Now, throughput scales very well as the number of vCPUs increases. This means that, for the first time, it is now beneficial to allocate many vCPUs to dom0 – so that when there is demand, dom0 can deliver. Hence we have given XenServer 7.0 a higher default number of dom0 vCPUs.

How many vCPUs are now allocated to dom0 by default?

Most hosts will now get 16 vCPUs by default, but the exact number depends on the number of CPU cores on the host. The following graph summarises how the default number of dom0 vCPUs is calculated from the number of CPU cores on various current and historic XenServer releases:

b2ap3_thumbnail_dom0-vcpus.png

Summary of improvements

I will conclude with some aggregate I/O measurements comparing XenServer 6.5 and 7.0 under default settings (no dom0 configuration changes) on a Dell R730xd.

  1. Aggregate network throughput – twenty pairs of 32-bit Debian 6.0 VMs sending and receiving traffic generated with iperf 2.0.5.
    b2ap3_thumbnail_aggr-intrahost-r730_20160729-093608_1.png
  2. Aggregate storage IOPS – twenty 32-bit Windows 7 SP1 VMs each doing single-threaded, serial, sequential 4KB reads with fio to a virtual disk on an Intel P3700 NVMe drive.
    b2ap3_thumbnail_storage-iops-aggr-p3700-win7.png
Continue reading
12001 Hits
2 Comments

XenServer Administrators Handbook Published

Last year, I announced that we were working on a XenServer Administrators Handbook, and I'm very pleased to announce that it's been published. Not only have we been published, but based on the Amazon reviews to date we've done a pretty decent job. In part, I suspect that has a ton to do with the book being focused on what information you, XenServer administrators, need to be successful when running a XenServer environment regardless of scale or workload.

XenServer Administrators HandbookThe handbook is formatted following a simple premise; first you need to plan your deployment and second you need to run it. With that in mind, we start with exactly what a XenServer is, define how it works and what expectations it has on infrastructure. After all, it's critical to understand how a product like XenServer interfaces with the real world, and how its virtual objects relate to each other. We even cover some of the misunderstandings those new to XenServer might have.

While it might be tempting to go deep on some of this stuff, Jesse and I both recognized that virtualization SREs have a job to do and that's to run virtual infrastructure. As interesting as it might be to dig into how the product is implemented, that's not the role of an administrators handbook. That's why the second half of the book provides some real world scenarios, and how to go about solving them.

We had an almost limitless list of scenarios to choose from, and what you see in the book represents real world situations which most SREs will face at some point. The goal of this format being to have a handbook which can be actively used, not something which is read once and placed on some shelf (virtual or physical). During the technical review phase, we sent copies out to actual XenServer admins, all of whom stated that we'd presented some piece of information they hadn't previously known. I for one consider that to be a fantastic compliment.

Lastly, I want to finish off by saying that like all good works, this is very much a "we" effort. Jesse did a top notch job as co-author and brings the experience of someone who's job it is to help solve customer problems. Our technical reviewers added tremendously to the polish you'll find in the book. The O'Reilly Media team was a pleasure to work with, pushing when we needed to be pushed but understanding that day jobs and family take precedence.

So whether you're looking at XenServer out of personal interest, have been tasked with designing a XenServer installation to support Citrix workloads, clouds, or for general purpose virtualization, or have a XenServer environment to call your own, there is something in here for you. On behalf of Jesse, we hope that everyone who gets a copy finds it valuable. The XenServer Administrator's handbook is available from book sellers everywhere including:

Amazon: http://www.amazon.com/XenServer-Administration-Handbook-Successful-Deployments/dp/149193543X/

Barnes and Noble: http://www.barnesandnoble.com/w/xenserver-administration-handbook-tim-mackey/1123640451

O'Reilly Media: http://shop.oreilly.com/product/0636920043737.do

If you need a copy of XenServer to work with, you can obtain that for free from: http://xenserver.org/download

Recent Comments
Tobias Kreidl
A timely publication, given all the major recent enhancements to XenServer. It's packed with a lot of hands-on, practical advice a... Read More
Tuesday, 03 May 2016 03:37
Eric Hosmer
Been looking forward to getting this book, just purchased it on Amazon. Now I just need to find that mythical free time to read ... Read More
Friday, 06 May 2016 22:41
Continue reading
11774 Hits
2 Comments

XENSERVER产品DUNDEE预览版2发布

2015 刚刚结束, 2016新年开始,正是我们向XenServer社区发节日礼物的好时节。我们发布了下一代XenServer产品Dundee预览版2。我们花了大量精力解决已上报的各种问题,也导致该测试版本和九月份的预览版1相比有些延迟。我们确信该预览版和Steve Wilson博客 (https://www.citrix.com/blogs/2015/12/14/citrix-xenserver-infrastructure-strategy/) 中对思杰在XenServer项目贡献的肯定是很好的新年礼物。此外,我们已开始在xenserver.org网站提供一系列博客把主要改进点做深度介绍。对于那些更关注该预览版亮点的朋友,现在就让我在下面为您做一一介绍。

异构处理器集群

多年来XenServer一直支持用不同代的CPU创建处理器资源池,但是几年前采用因特尔CPU发生一些改变,这影响了混合使用最新的CPU和相对较老的CPU的能力。好消息是,使用Dundee预览版2,这种状况得到了彻底解决,且确确实实提高了性能。这个领域需要我们把事情完完全全的做正确,我们非常感激任何人运行Dundee体验该特性并上报成功或者上报遇到的问题。

增强的扩展性

当代的服务器致力于增强其能力,我们不仅要与时俱进,更要确保用户能创建真正反应物理服务器能力的虚拟机。Dundee预览版2 目前支持512个物理CPU(pCPU),可创建高达1.5TB内存的用户虚拟机。您也许会问是否考虑增加虚拟CPU上限,我们已经把上限扩大到32个。在Xen项目的管理程序中,我们已经默认支持PML(Page Modification Logging)。 对PML设计的详细信息已经发布在Xen项目归档中等待审核。最后,XenServer的Dom0内核版本已经升级到3.10.93。

支持新的SUSE版本

SUSE为其企业服务器版SLES(SUSE Linux Enterprise Server)和企业桌面版SLED(SUSE Linux Enterprise Desktop)发布了version 12 SP1,这两者在Dundee中都已得到支持。

安全更新

自从Dundee预览版1在九月下旬发布后,若干安全相关的热补丁已经在XenServer 6.5 SP1中发布。 同样的补丁也应用到了Dundee并且已经包含在预览版2中。

下载信息

您可以在预览页http://xenserver.org/preview) 下载Dundee预览版2,任何发现的问题都欢迎上报到我们的故障库https://bugs.xenserver.org)。

Recent Comments
Tim Mackey
Evgeny, The original English version can be found here: http://xenserver.org/discuss-virtualization/virtualization-blog/entry/xen... Read More
Saturday, 09 January 2016 13:29
xing
居然有中文版了。。。妹子。。。
Sunday, 10 January 2016 14:22
Continue reading
6350 Hits
3 Comments

XenServer Dundee Beta.2 Available

With 2015 quickly coming to a close, and 2016 beckoning, it's time to deliver a holiday present to the XenServer community. Today, we've released beta 2 of project Dundee. While the lag between beta 1 in September and today has been a bit longer than many would've liked, part of that lag was due to the effort involved in resolving many of the issues reported. The team is confident you'll find both beta 2 and Steve Wilson's blog affirming Citrix's commitment to XenServer to be a nice gift. As part of that gift, we're planning to have a series of blogs covering a few of the major improvements in depth, but for those of you who like the highlights - let's jump right in!

CPU leveling

XenServer has supported for many years the ability to create resource pools with processors from different CPU generations, but a few years back a change was made with Intel CPUs which impacted our ability mix the newest CPUs with much older ones. The good news is that with Dundee beta.2, that situation should be fully resolved, and may indeed offer some performance improvements. Since this is an area where we really need to get things absolutely correct, we'd appreciate anyone running Dundee to try this out if you can and report back on successes and issues.

Increased scalability

Modern servers keep increasing their capacity, and not only do we need to keep pace, but we need to ensure users can create VMs which mirror the capacity of a physical machines. Dundee beta.2 now supports up to 512 physical cores (pCPUs), and can create guest VMs with up to 1.5 TB RAM. Some of you might ask about increasing vCPU limits, and we've bumped those up to 32 as well. We've also enabled Page Modification Logging (PML) in the Xen Project hypervisor as a default. The full design details for PML are posted in the Xen Project Archives for review if you'd like to get into the weeds of why this is valuable. Lastly we've bumped the kernel version to 3.10.93.

New SUSE templates

SUSE have released version 12 SP1 for both (SUSE Linux Enterprise Server) SLES and (SUSE Linux Enterprise Desktop) SLED, both of which are now supported templates in Dundee.

Security updates

Since Dundee beta.1 was made available in late September, a number of security hotfixes for XenServer 6.5 SP1 have been released. Where valid, those same security patches have been applied to Dundee and are included in beta.2.

Download Information

You can download Dundee beta.2 from the Preview Download page (http://xenserver.org/preview), and any issues found can be reported in our defect database (https://bugs.xenserver.org).     

Recent Comments
Tobias Kreidl
Great news, Tim! Is it correct the 32 VCPUs are now supported for both Linux and Windows guests (where of course supported by the ... Read More
Tuesday, 22 December 2015 04:43
Tim Mackey
Thanks, Tobias. This would be 32 vCPUS for both PV and HVM. I *think* we're at 254 VDIs in Dundee, but with the caveat things mi... Read More
Thursday, 24 December 2015 01:59
Tim Mackey
N3ST, We never commit to a stable upgrade path from any preview version to a final release. It's been known to work in previous ... Read More
Thursday, 24 December 2015 02:02
Continue reading
19419 Hits
15 Comments

XenServer Dundee Alpha.3 Available

The XenServer team is pleased to announce the availability of the third alpha release in the Dundee release train. This release includes a number of performance oriented items and includes three new functional areas.

  • Microsoft Windows 10 driver support is now present in the XenServer tools. The tools have yet to be WHQL certified and are not yet working for GPU use cases, but users can safely use them to validate Windows 10 support.
  • FCoE storage support has been enabled for the Linux Bridge network stack. Note that the default network stack is OVS, so users wishing to test FCoE will need to convert the network stack to Bridge and will need to be aware of the feature limitations in Bridge relative to OVS.
  • Docker support present in XenServer 6.5 SP1 is now also present in Dundee

Considerable work has been performed to improve overall I/O throughput on larger systems and improve system responsiveness under heavy load. As part of this work, the number of vCPUs available to dom0 have been increased on systems with more than 8 pCPUs. Early results indicate a significant improvement in throughput compared to Creedence. We are particularly interested in hearing from users who have previously experienced responsiveness or I/O bottlenecks to look at Alpha.3 and provide their observations.

Dundee alpha.3 can be downloaded from the pre-release download page.     

Recent Comments
Sam McLeod
Hi Tim, Great to hear about the IO! As you know, we're very IO intensive and have very high speed flash-only iSCSI storage. I'll ... Read More
Thursday, 20 August 2015 01:50
Tim Mackey
Excellent, Sam. I look forward to hearing if you feel we've moved in the right direction and by how much.
Thursday, 20 August 2015 02:08
Chris Apsey
Has any testing been done with 40gbps NICS? We are considering upgrading to Chelsio T-580-LP-CRs, and if the ~24gbps barrier has ... Read More
Thursday, 20 August 2015 19:25
Continue reading
21085 Hits
10 Comments

XenServer's LUN scalability

"How many VMs can coexist within a single LUN?"

An important consideration when planning a deployment of VMs on XenServer is around the sizing of your storage repositories (SRs). The question above is one I often hear. Is the performance acceptable if you have more than a handful of VMs in a single SR? And will some VMs perform well while others suffer?

In the past, XenServer's SRs didn't always scale too well, so it was not always advisable to cram too many VMs into a single LUN. But all that changed in XenServer 6.2, allowing excellent scalability up to very large numbers of VMs. And the subsequent 6.5 release made things even better.

The following graph shows the total throughput enjoyed by varying numbers of VMs doing I/O to their VDIs in parallel, where all VDIs are in a single SR.

3541.png

In XenServer 6.1 (blue line), a single VM would experience modest 240 MB/s. But, counter-intuitively, adding more VMs to the same SR would cause the total to fall, reaching a low point around 20 VMs achieving a total of only 30 MB/s – an average of only 1.5 MB/s each!

On the other hand, in XenServer 6.5 (red line), a single VM achieves 600 MB/s, and it only requires three or four VMs to max out the LUN's capabilities at 820 MB/s. Crucially, adding further VMs no longer causes the total throughput to fall, but remains constant at the maximum rate.

And how well distributed was the available throughput? Even with 100 VMs, the available throughput was spread very evenly -- on XenServer 6.5 with 100 VMs in a LUN, the highest average throughput achieved by a single VM was only 2% greater than the lowest. The following graph shows how consistently the available throughput is distributed amongst the VMs in each case:

4016.png

Specifics

  • Host: Dell R720 (2 x Xeon E5-2620 v2 @ 2.1 GHz, 64 GB RAM)
  • SR: Hardware HBA using FibreChannel to a single LUN on a Pure Storage 420 SAN
  • VMs: Debian 6.0 32-bit
  • I/O pattern in each VM: 4 MB sequential reads (O_DIRECT, queue-depth 1, single thread). The graph above has a similar shape for smaller block sizes and for writes.
Recent Comments
Tobias Kreidl
Very nice, Jonathan, and it is always good to raise discussions about standards that are known to change over time. This is partic... Read More
Friday, 26 June 2015 19:52
Tobias Kreidl
Indeed, depending on the specific characteristics of each storage array there will be some maximum queue depth per connection (por... Read More
Saturday, 27 June 2015 04:27
Jonathan Davies
Thanks for your comments, Tobias and John. You're absolutely right -- the LUN's capabilities are an important consideration. And n... Read More
Monday, 29 June 2015 08:53
Continue reading
16762 Hits
6 Comments

Log Rotation and Syslog Forwarding

A Continuation of Root Disk Management

First, this article is applicable to any sized XenServer deployment and secondly, it is a continuation off of my previous article regarding XenServer root disk maintenance.  The difference is that - for all XenServer deployments - the topic revolves specifically with that of Syslog: from tuning log rotation, specifying the amount of logs to retain, leveraging compression, and of course... Syslog forwarding.

All of this is an effort to share tips to new (or seasoned) XenServer Administrators in the options available to ensure necessary Syslog data does not fill a XenServer root disk while ensuring - for certain industry specific requirements - that log-specific data is retained without sacrafice.

Syslog: A Quick Introduction

So, what is this Syslog?  In short it can be compared to the Unix/Linux equivalent of Windows Event Log (along with other logging mechanisms popular to specific applications/Operating Systems). 

The slightly longer explanation is that Syslog is not only a daemon, but also a protocol: established long ago for Unix systems to record system and application to local disk as well as offering the ability to forward the same log information to its peers for redundancy, concentration, and to conserve disk space on highly active systems.  For more detailed information on the finer details of the Syslog protocol and daemon one can review the IETF's specification at http://tools.ietf.org/html/rfc5424.

On a stand-alone XenServer, the Syslog daemon is started on boot and its configuration file for handling source, severity, types of logs, and where to store them are defined in /etc/syslog.conf.  It is highly recommended that one does not alter this file unless necessary and if one knows what they are doing.  From boot to reboot, information is stored in various files: found under the root disk's /var/log directory.

Taken from a fresh installation of XenServer, the following shows various log files that store information specific to a purpose.  Note that the items in "brown" are sub-directories:

For those seasoned in administering XenServer it is visible that from the kernel-level and user-space level there are not many log files.  However, XenServer is verbose about logging for a very simple reason: collection, analysis, and troubleshooting if an issue should arise.

So for a lone XenServer (by default) logs are essentially received by the Syslog daemon and based on /etc/syslog.conf - as well as the source and type of message - stored on the local root file system as discussed:

Within a pooled XenServer environment things are pretty much the same: for the most part.  As a pool has a master server, log data for the Storage Manager (as a quick example) is trickled up to the master server.  This is to ensure that while each pool member is recording log data specific to itself, the master server has the aggregate log data needed to promote troubleshooting of the entire pool from one point.

Log Rotation

Log rotation, or "logrotate", is what ensures that Syslog files in /var/log do not grow out of hand.  Much like Syslog, logrotate utilizes a configuration file to dictate how often, at what size, and if compression should be used when archiving a particular Syslog file.  The term "archive" is truly meant for rotating out a current log in place of a fresh, current log to take its place.

Post XenServer installation and before usage, one can measure the amount of free root disk space by executing the following command:

df -h

The output will be similar to the following and the line one should be most concerned with is in bold font:

Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             4.0G  1.9G  2.0G  49% /
none                  381M   16K  381M   1% /dev/shm
/opt/xensource/packages/iso/XenCenter.iso
                       52M   52M     0 100% /var/xen/xc-install

Once can see by the example that only 49% of the root disk on this XenServer host has been used.  Repeating this process as implementation ramps up, an administrator should be able to measure how best to tune logrotate's configuration file for after install, /etc/logrotate should resemble the following:

# see "man logrotate" for details
# rotate log files weekly
weekly
# keep 4 weeks worth of backlogs
rotate 4
# create new (empty) log files after rotating old ones
create
# uncomment this if you want your log files compressed
#compress
# RPM packages drop log rotation information into this directory
include /etc/logrotate.d
# no packages own wtmp -- we'll rotate them here
/var/log/wtmp {
    monthly
    minsize 1M
    create 0664 root utmp
    rotate 1
}
/var/log/btmp {
    missingok
    monthly
    minsize 1M
    create 0600 root utmp
    rotate 1
}
# system-specific logs may be also be configured here.

In previous versions, /etc/logrotate.conf was setup to retain 999 archived/rotated logs, but as of 6.2 the configuration above is standard. 

Before covering the basic premise and purpose of this configuration file, one can see this exact configuration file explained in more detail at http://www.techrepublic.com/article/manage-linux-log-files-with-logrotate/

The options declared in the default configuration are conditions that, when met, rotate logs accordingly:

  1. The first option specifies when to invoke log rotation.  By default this is set to weekly and may need to be adjusted for "daily".  This will only swap log files out for new ones and will not delete log files.
  2. The second option specifies how long to keep archived/rotate log files on the disk.  The default is to remove archived/rotated log files after a week.  This will delete log files that meet this age.
  3. The third options specifies what to do after rotating a log file out.  The default - which should not be changed is to create a new/fresh log after rotating out its older counterpart.
  4. The fourth option - which is commented out - specifies another what to do, but this time for the archived log files.  It is highly recommended to remove the comment mark so that archived log files are compressed: saving on disk space.
  5. A fifth option which is not present in the default conf is the "size" option.  This specifies how to handle logs that reach a certain size, such as "size 15M".  This option should be employed: especially if an administrator has SNMP logs that grow exponentially or notices that the particular XenServer's Syslog files are growing faster than logrotate can rotate and dispose of archived files.
  6. The "include" option specifies a sub-directory wherein unique, logrotate configurations can be specified for individual log files.
  7. The remaining portion should be left as is


In summary for logrotate, one is advised to measure use of the root disk using "df -h" and to tune logrotate.conf as needed to ensure Syslog does not inadvertently consume available disk space.

And Now: Syslog Forwarding

Again, this is a long standing feature and one I have been looking forward to explaining, highlighting, and providing examples for.  However, I have had a kind of writers block for many reasons: mainly that it ties into Syslog, Logrotate, and XenCenter, but also that there is a tradeoff.

I mentioned before that Syslog can forward messages to other hosts.  Furthermore, it can forward Syslog messages to other hosts without writing a copy of the log to local disk.  What this means is that a single XenServer or a pool of XenServers can send their log data to a "Syslog Aggregator".

The trade off is that one cannot generate a server status report via XenCenter, but instead gather the logs from the Syslog aggregate server and manually submit them for review.  That being said, one can ensure that low root disk space is not nearly as high of a concern on the "Admin Todo List" and can retain vast amounts of log data for a deployment of any size: based on dictated industry practices or for, sarcastically, nostalgic purposes.

The principles with Syslog and logrotate.conf will apply to the Syslog Aggregator as what good is a Syslog server if not configured properly as to ensure it does not fill itself up?  The requirements to instantiate a Syslog aggregation server, configure the forwarding of Syslog messages, and so forth are quite simple:

  1. Port 514 must be opened on the network
  2. The Syslog aggregation server must be reachable - either by being on the same network segment or not - by each XenServer host
  3. The Syslog aggregation server can be a virtual or physical machine; Windows or Linux-based with either a native Syslog daemon configured to receive external host messages or using a Windows-based Syslog solution offering the same "listening" capabilities.
  4. The Syslog aggregation server must have a static IP assigned to it
  5. The Syslog aggregation server should be monitored and tuned just as if it were Syslog/logrotate on a XenServer
  6. For support purposes, logs should be easily copied/compressed from the Syslog aggregation server - such as using WinSCP, scp, or other tools to copy log data for support's analysis

The quickest means to establish a simple virtual or physical Syslog aggregation server - in my opinion - is to reference the following two links.  These describe the installation of a base Debian-based system with specific intent to leverage Rsyslog for the recording of remote Syslog messages sent to it over UDP port 514 from one's XenServers:

http://www.aboutdebian.com/syslog.htm

http://www.howtoforge.com/centralized-rsyslog-server-monitoring

Alternatively, the following is an all-in-one guide (using Debian) with Syslog-NG:

http://www.binbert.com/blog/2010/04/syslog-server-installation-configuration-debian/

Once the server is instantiated and ready to record remote Syslog messages, it is time to open XenCenter.  Right click on a pool master or stand-alone XenServer and select "Properties":


In the window that appear - in the lower left-hand corner - is an option for "Log Destination":

To the right, one should notice the default option selected is "Local".  From there, select the "Remote" option and enter the IP address (or FQDN) of the remote Syslog aggregate server as follows:

Finally, select "OK" and the stand-alone XenServer (or pool) will update its Syslog configuration, or more specifically, /var/lib/syslog.conf.  The reason for this is so Elastic Syslog can take over the normal duties of Syslog: forwarding messages to the Syslog aggregator accordingly.

For example, once configured, the local /var/log/kern.log file will state:

Sep 18 03:20:27 bucketbox kernel: Kernel logging (proc) stopped.
Sep 18 03:20:27 bucketbox kernel: Kernel log daemon terminating.
Sep 18 03:20:28 bucketbox exiting on signal 15

Certain logs will still continue to record Syslog on the host, so it may be desirable to edit /var/lib/syslog.conf and add comments to lines where a "-/var/log/some_filename" is specified as lines with "@x.x.x.x" dictate to forward to the Syslog aggregator.  As an example, I have marked the lines in bold to show where comments should be added to prevent further logging to the local disk:

# Save boot messages also to boot.log
local7.*             @10.0.0.1
# local7.*         /var/log/boot.log

# Xapi rbac audit log echoes to syslog local6
local6.*             @10.0.0.1
# local6.*         -/var/log/audit.log

# Xapi, xenopsd echo to syslog local5
local5.*             @10.0.0.1
# local5.*         -/var/log/xensource.log

After one - The Administrator - has decided what logs to keep and what logs to forward, Elastic Syslog can be restarted as so the changes take affect by executing:

/etc/init.d/syslog restart

Since Elastic Syslog - a part of XenServer - is being utilized, the init script will ensure that Elastic Syslog is bounced and that it is responsible for handling Syslog forwarding, etc.

 

So, with this - I hope you find it useful and as always: feedback and comments are welcomed!

 

--jkbs | @xenfomation

 

 

 

Recent Comments
Tobias Kreidl
Super nice post, Jesse! One great reason to have logs on more than one server is that if there is ever a security issue, you stan... Read More
Thursday, 18 September 2014 17:12
JK Benedict
I could NOT agree more, Tobias and why I have been testing, experimenting, and really just trying to push the bounds as far as I c... Read More
Saturday, 27 September 2014 08:53
JK Benedict
Thank you, Tobias! Indeed, RSyslog and a base Debian install is my preferred choice for Syslog aggregation due to exactly what yo... Read More
Friday, 19 September 2014 03:08
Continue reading
51686 Hits
16 Comments

Pushing XenServer limits with Creedence beta.2

Well folks, its that time once again; we've another XenServer build ripe for your inspection, and this one is a critical one for a number of reasons. Today we've released XenServer Creedence beta.2, and this is binary compatible with a Citrix Tech Preview refresh. The build number is 87850 and it presents itself to the outside world as 6.4.95. Over the past few announcements I've hinted at pushing the boundaries of XenServer and wanting the community at large to "have at it", but I've not put out too many details on the overall performance we're seeing internally. The most important attribute of this build is that internally, its going to form part of a series of long term stability tests. Yes folks, we're that confident in what we're seeing and I wanted to thank everyone who has participated in our pre-release activities by sharing a few performance tidbits:

  • Creedence can start and run 1000 PV VMs with only 8GB dom0 memory. That's up from the 650 we have in XenServer 6.2.
  • Booting 125 Windows 7 VMs on a single host takes only 350 seconds in a bootstorm scenario. That's down from 850 seconds in XenServer 6.2
  • Aggregate disk throughput has been measured to improve by as much as 100% when compared to XenServer 6.2
  • Aggregate intrahost network throughput has been measured to improve by as much as 200% when compared to XenServer 6.2
  • The number of virtual disks per host has been raised by a factor of four when compared to XenServer 6.2

When compared to beta.1, the team has been looking at a number of performance and scalability system aspects, with a primary focus on dom0 idle state behavior at scale. This is a very important aspect of system operation as overall system responsiveness is directly tied to the overhead of managing a large number of VMs. We did see two distinct areas for investigation, and are inviting the community to look into these and provide us with others. Those two areas are:

  • When using 40Gb NICs outbound (transmit) performance is below expectations. We have some internal fixes, but are encouraging anyone with such NICs to test and report their findings
  • When large numbers of hosts are pooled we're seeing VM start times appear to slow unexpectedly under large pool VM densities.

 

As always we're actively encouraging you to test the beta and provide your feedback (both positive and negative) in an incident report. You can download beta.2 from here: http://xenserver.org/component/content/article/11-product/142-download-pre-release.html, and enter your feedback at https://bugs.xenserver.org.     

Recent Comments
Tim Mackey
@bpbp If you're comparing the ovs in Creedence to previous Creedence builds, then they are identical. If you're comparing it to ... Read More
Sunday, 24 August 2014 23:37
Tim Mackey
@vati, If I understand correctly, you're looking for when the DVSC will appear? The DVSC is being handled via the Citrix Tech Pr... Read More
Monday, 25 August 2014 14:32
Tim Mackey
@bpbp, We're hoping that the upgrade to ovs 2.1.2 will address most of the issues seen with the older ovs. If you're in a positi... Read More
Monday, 25 August 2014 14:33
Continue reading
18163 Hits
19 Comments

In-memory read caching for XenServer

Overview

In this blog post, I introduce a new feature of XenServer Creedence alpha.4, in-memory read caching, the technical details, the benefits it can provide, and how best to use it.

Technical Details

A common way of using XenServer is to have an OS image, which I will call the golden image, and many clones of this image, which I will call leaf images. XenServer implements cheap clones by linking images together in the form of a tree. When the VM accesses a sector in the disk, if a sector has been written into the leaf image, this data is retrieved from that image. Otherwise, the tree is traversed and data is retrieved from a parent image (in this case, the golden image). All writes go into the leaf image. Astute readers will notice that no writes ever hit the golden image. This has an important implication and allows read caching to be implemented.

tree.png

tapdisk is the storage component in dom0 which handles requests from VMs (see here for many more details). For safety reasons, tapdisk opens the underlying VHD files with the O_DIRECT flag. The O_DIRECT flag ensures that dom0's page cache is never used; i.e. all reads come directly from disk and all writes wait until the data has hit the disk (at least as far as the operating system can tell, the data may still be in a hardware buffer). This allows XenServer to be robust in the face of power failures or crashes. Picture a situation where a user saves a photo and the VM flushes the data to its virtual disk which tapdisk handles and writes to the physical disk. If this write goes into the page cache as a dirty page and then a power failure occurs, the contract between tapdisk and the VM is broken since data has been lost. Using the O_DIRECT flag allows this situation to be avoided and means that once tapdisk has handled a write for a VM, the data is actually on disk.

Because no data is ever written to the golden image, we don't need to maintain the safety property mentioned previously. For this reason, tapdisk can elide the O_DIRECT flag when opening a read-only image. This allows the operating system's page cache to be used which can improve performance in a number of ways:

  • The number of physical disk I/O operations is reduced (as a direct consequence of using a cache).
  • Latency is improved since the data path is shorter if data does not need to be read from disk.
  • Throughput is improved since the disk bottleneck is removed.

One of our goals for this feature was that it should have no drawbacks when enabled. An effect which we noticed initially was that data appeared to be read twice from disk which increases the number of I/O operations in the case where data is only read once from the VM. After a little debugging, we found that disabling O_DIRECT causes the kernel to automatically turn on readahead. Because data access pattern of a VM's disk tends to be quite random, this had a detrimental effect on the overall number of read operations. To fix this, we made use of a POSIX feature, posix_fadvise, which allows an application to inform the kernel how it plans to use a file. In this case, tapdisk tells the kernel that access will be random using the POSIX_FADV_RANDOM flag. The kernel responds to this by disabling readahead, and the number of read operations drops to the expected value (the same as when O_DIRECT is enabled).

Administration

Because of difficulties maintaining cache consistency across multiple hosts in a pool for storage operations, read caching can only be used with file-based SRs; i.e. EXT and NFS SRs. For these SRs, it is enabled by default. There shouldn't be any performance problems associated with this; however, if necessary, it is possible to disable read caching for an SR:

xe sr-param-set uuid=<UUID> other-config:o_direct=true

You may wonder how read caching differs from IntelliCache. The major difference is that IntelliCache works by caching reads from the network onto a local disk while in-memory read caching caches reads from either into memory. The advantage of in-memory read caching is that memory is still an order of magnitude faster than an SSD so performance in bootstorms and other heavy I/O situations should be improved. It is possible for them both to be enabled simultaneously; in this case reads from the network are cached by IntelliCache to a local disk and reads from that local disk are cached in memory with read caching. It is still advantageous to have IntelliCache turned on in this situation because the amount of available memory in dom0 may not be enough to cache the entire working set and reading the remainder from local storage is quicker than reading over the network. IntelliCache further reduces the load on shared storage when using VMs with disks that are not persistent across reboots by only writing to the local disk, not the shared storage.

Talking of available memory, XenServer admins should note that to make best use of read caching, the amount of dom0 memory may need to be increased. Ideally the amount of dom0 memory would be increased to the size of the golden image so that once cached, no more reads hit the disk. In case this is not possible, an approach to take would be to temporarily increase the amount of dom0 memory to the size of the golden image, boot up a VM and open the various applications typically used, determine how much dom0 memory is still free, and then reduce dom0's memory by this amount.

Performance Evaluation

Enough talk, let's see some graphs!

reads.png

In this first graph, we look at the number of bytes read over the network when booting a number of VMs on an NFS SR in parallel. Notice how without read caching, the number of bytes read scales proportionately with the number of VMs booted which checks out since each VM's reads go directly to the disk. When O_DIRECT is removed, the number of bytes read remains constant regardless of the number of VMs booted in parallel. Clearly the in-memory caching is working!

time.png

How does this translate to improvements in boot time? The short answer: see the graph! The longer answer is that it depends on many factors. In the graph, we can see that there is little difference in boot time when booting less than 4 VMs in parallel because the NFS server is able to handle that much traffic concurrently. As the number of VMs increases, the NFS server becomes saturated and the difference in boot time becomes dramatic. It is clear that for this setup, booting many VMs is I/O-limited so read caching makes a big difference. Finally, you may wonder why the boot time per VM increases slowly as the number of VMs increases when read caching is enabled. Since the disk is no longer a bottleneck, it appears that some other bottleneck has been revealed, probably CPU contention. In other words, we have transformed an I/O-limited bootstorm into a CPU-limited one! This improvement in boot times would be particularly useful for VDI deployments where booting many instances of the same VM is a frequent occurrence.

Conclusions

In this blog post, we've seen that in-memory read caching can improve performance in read I/O-limited situations substantially without requiring new hardware, compromising reliability, or requiring much in the way of administration.

As future work to improve in-memory read caching further, we'd like to remove the limitation that it can only use dom0's memory. Instead, we'd like to be able to use the host's entire free memory. This is far more flexible than the current implementation and would remove any need to tweak dom0's memory.

Credits

Thanks to Felipe Franciosi, Damir Derd, Thanos Makatos and Jonathan Davies for feedback and reviews.

Recent Comments
Tim Mackey
I'm not familiar with ZFS, but XenServer has had an shared storage cache called IntelliCache. It's designed for use in highly tem... Read More
Monday, 28 July 2014 02:19
Tobias Kreidl
Ross, Nice article! This cache is definitely going to help but as you pointed out, at some point, the size of the golden image wil... Read More
Tuesday, 29 July 2014 05:12
Tobias Kreidl
Apparently I hit a sore spot with you, "whatever"... I never said Nexenta was the best or most innovative solution out there, but... Read More
Thursday, 31 July 2014 04:28
Continue reading
40293 Hits
7 Comments

XenServer Creedence Tech Preview and Creedence Alpha

This morning astute followers of XenServer activity noticed that Citrix had made available the previously announced Tech Preview for Creedence. The natural follow on question is how this relates to the alpha program we've been running on xenserver.org. The easy answer is that the Citrix Tech Preview of XenServer Creedence is binary compatible with the alpha.4 pre-release binary you can get from xenserver.org. From the perspective of the core platform (i.e. XenServer virtualization bits), the only difference is in the EULA.

So why run a Tech Preview if you have a successful alpha program already?

That's where the differences between a pure open source effort and a commercial effort begin. While the XenServer platform components are binary compatible, Citrix customers have expectations for features which couldn't be made open source, or implementations directly supporting Citrix commercial products. Perfect examples of these features and implementations can be seen on the Tech Preview download page, such as Microsoft System Center Integration, expanded vGPU support for XenDesktop, and the return of both the DVSC and WLB. While there is no guarantee any of those features or specific implementations will be present in the final Citrix release, or for that matter under what license, Citrix is seeking your input on them and a Tech Preview program is how that is accomplished.

I can't access the Tech Preview site; what's wrong?

If you can't login to the Tech Preview site, that likely means your account isn't associated with a Citrix commercial contract for XenServer. Since the alpha.4 pre-release is binary compatible, you can experience all the platform improvements yourself by downloading XenServer from the xenserver.org pre-release download page.

How do I provide feedback?

If you are able to participate in the Tech Preview program, you'll find the options for feedback listed on the Tech Preview page. Of course, even if you can participate in the Tech Preview program we're always accepting XenServer feedback through our public feedback options:

The XenServer incident database: https://bugs.xenserver.org

Development feedback (xs-devel): https://lists.xenserver.org/sympa/info/xs-devel

What cool things are in alpha.4?

This is the best for last! We've already got in Creedence a 64 bit dom0, updated Linux 3.10 kernel, updated open virtual switch 2.1.2, improved boot storm handling, read caching for file based SRs; so what goodies are in here for the core platform people? Let's start with TRIM and UNMAP to better reclaim storage, then add in 32 bit to 64 bit VM migration to support upgrade scenarios and storage migration from XenServer 6.2 and prior, all with additional operating system validation with SLES 11 SP3 and Ubuntu 14.04 LTS.

What testing would you like us to do?

Don't let the alpha label fool you, alpha.4 of XenServer Creedence has been through quite a bit of testing, and is very much ready for you to try and stress it. The one thing we can say is that we're still working on the performance tuning so if you push things really hard, dom0 may run out of memory and you might need to follow CTX134951 to increase it (valid values are 8192, 16384 and 32768). This is particularly true if you're running more than 200 VMs per host, or need to attach more than 1200 virtual disks to VMs.     

Recent Comments
Tim Mackey
James, We now have the code in place to do a 32 bit to 64 bit migration, so upgrades are now on the table for testing, but I woul... Read More
Thursday, 10 July 2014 20:44
valentin radu
Hello, You can tell me when it will be released new stable version? Thx,
Tuesday, 15 July 2014 16:29
Hong Lae Cho
Hey Tim, Would you be able to provide a bit more clarification regarding "UNMAP" functionality in Creedence Alpha 4 & Tech Previ... Read More
Monday, 21 July 2014 05:10
Continue reading
25899 Hits
9 Comments

Overview of the Performance Improvements between XenServer 6.2 and Creedence Alpha 2

The XenServer Creedence Alpha 2 has been released, and one of the main focuses in Alpha 2 was the inclusion of many performance improvements that build on the architectural improvements seen in Alpha 1. This post will give you an overview of these performance improvements in Creedence, and will start a series of in-depth blog posts with more details about the most important ones.

Creedence Alpha 1 introduced several architectural improvements that aim to improve performance and fix a series of scalability limits found in XenServer 6.2:

  • A new 64-bit Dom0 Linux kernel. The 64-bit kernel will remove the cumbersome low/high-memory division present in the previous 32-bit Dom0 kernel, which limited the maximum amount of memory that Dom0 could use and which added memory access penalties in a Dom0 with more than 752MB RAM. This means that the Dom0 memory can now be arbitrarily scaled up to cope with memory demands of the latest vGPU, disk and network drivers, support for more VMs and internal caches to speed up disk access (see, for instance, the Read-caching section below).

  • Dom0 Linux kernel 3.10 with native support for the Xen Project hypervisor. Creedence Alpha 1 adopted a very recent long-term Linux kernel. This modern Linux kernel contains many concurrency, multiprocessing and architectural improvements over the old xen-Linux 2.6.32 kernel used previously in XenServer 6.2. It contains pvops features to run natively on the Xen Project hypervisor, and streamlined virtualization features used to increase datapath performance, such as a grant memory device that allows Dom0 user space processes to access memory from a guest (as long as the guest agrees in advance). Additionally, the latest drivers from hardware manufacturers containing performance improvements can be adopted more easily.

  • Xen Project hypervisor 4.4. This is the latest Xen Project hypervisor version available, and it improves on the previous version 4.1 on many accounts. It vastly increases the number of virtual event channels available for Dom0 -- from 1023 to 131071 -- which can translate into a correspondingly larger number of VMs per host and larger numbers of virtual devices that can be attached to them. XenServer 6.2 was using a special interim change that provided 4096 channels, which was enough for around 500 VMs per host with a few virtual devices in each VM. With the extra event channels in version 4.4, Creedence Alpha 1 can have each of these VMs endowed with a richer set of virtual devices. The Xen Project hypervisor 4.4 also handles grant-copy locking requests more efficiently, improving aggregate network and disk throughput; it facilitates future increases to the supported amount of host memory and CPUs; and it adds many other helpful scalability improvements.

  • Tapdisk3. The latest Dom0 disk backend design has been enabled by default for all the guest VBDs. While the previous tapdisk2 in XenServer 6.2 would establish a datapath to the guest in a circuitous way via a Dom0 kernel component, tapdisk3 in Creedence Alpha 1 establishes a datapath connected directly to the guest (via the grant memory device in the new kernel), minimizing latency and using less CPU. This results in big improvements in concurrent disk access and a much larger total aggregate disk throughput for the VBDs. We have measured aggregate disk throughput improvements of up to 100% on modern disks and machines accessing large blocksizes with large number of threads and observed local SSD arrays being maxed out when enough VMs and VBDs were used.

  • GRO enabled by default. The Generic Receive Offload is now enabled by default for all PIFs available to Dom0. This means that for GRO-capable NICs, incoming network packets will be transparently merged by the NIC and Dom0 will be interrupted less often to process the incoming data, saving CPU cycles and scaling much better with 10Gbps and 40Gbps networks. We have observed incoming single-stream network throughput improvements of 300% on modern machines.

  • Netback thread per VIF. Previously, XenServer 6.2 would have one netback thread for each existing Dom0 VCPU and a VIF would be permanently associated with one Dom0 VCPU. In the worst case, it was possible to end up with many VIFs forcibly sharing the same Dom0 VCPU thread, while other Dom0 VCPU threads were idle but unable to help. Creedence Alpha 2 improves this design and gives each VIF its own Dom0 netback thread that can run on any Dom0 VCPU. Therefore, the VIF load will now be spread evenly across all Dom0 VCPUs in all cases.

Creedence Alpha 2 then introduced a series of extra performance enhancements on top of the architecture improvements of Creedence Alpha 1:

  • Read-caching. In some situations, several VMs are all cloned from the same base disk so share much of their data while the few different blocks they write are stored in differencing-disks unique to each VM. In this case, it would be useful to be able to cache the contents of the base disk in memory, so that all the VMs can benefit from very fast access to the contents of the base disk, reducing the amount of I/O going to and from physical storage. Creedence Alpha 2 introduces this read caching feature enabled by default, which we expect to yield substantial performance improvements in the time it takes to boot VMs and other desktop and server applications where the VMs are mostly sharing a single base disk.

  • Grant-mapping on the network datapath. The pvops-Linux 3.10 kernel used in Alpha 1 had a VIF datapath that would need to copy the guest's network data into Dom0 before transmitting it to another guest or host. This memory copy operation was expensive and it would saturate the Dom0 VCPUs and limit the network throughput. A new design was introduced in Creedence Alpha 2, which maps the guest's network data into Dom0's memory space instead of copying it. This saves substantial Dom0 VCPU resources that can be used to increase the single-stream and aggregate network throughput even more. With this change, we have measured network throughput improvements of 250% for single-stream and 200% for aggregate stream over XenServer 6.2 on modern machines. 

  • OVS 2.1. An openvswitch network flow is a match between a network packet header and an action such as forward or drop. In OVS 1.4, present in XenServer 6.2, the flow had to have an exact match for the header. A typical server VM could have hundreds or more connections to clients, and OVS would need to have a flow for each of these connections. If the host had too many such VMs, the OVS flow table in the Dom0 kernel would become full and would cause many round-trips to the OVS userspace process, degrading significantly the network throughput to and from the guests. Creedence Alpha 2 has the latest OVS 2.1, which supports megaflows. Megaflows are simply a wildcarded language for the flow table allowing OVS to express a flow as group of matches, therefore reducing the number of required entries in the flow table for the most common situations and improving the scalability of Dom0 to handle many server VMs connected to a large number of clients.

Our goal is to make Creedence the most scalable and fastest XenServer release yet. You can help us in this goal by testing the performance features above and verifying if they boost the performance you can observe in your existing hardware.

Debug versus non-debug mode in Creedence Alpha

The Creedence Alpha releases use by default a version of the Xen Project hypervisor with debugging mode enabled to facilitate functional testing. When testing the performance of these releases, you should first switch to using the corresponding non-debugging version of the hypervisor, so that you can unleash its full potential suitable for performance testing. So, before you start any performance measurements, please run in Dom0:

cd /boot
ln -sf xen-*-xs?????.gz xen.gz   #points to the non-debug version of the Xen Project hypervisor in /boot

Double-check that the resulting xen.gz symlink is pointing to a valid file and then reboot the host.

You can check if the hypervisor debug mode is currently on or off by executing in Dom0:

xl dmesg | fgrep "Xen version"

and checking if the resulting line has debug=y or debug=n. It should be debug=n for performance tests.

You can reinstate the hypervisor debugging mode by executing in Dom0:

cd /boot
ln -sf xen-*-xs?????-d.gz xen.gz   #points to the debug (-d) version of the Xen Project hypervisor in /boot

and then rebooting the host.

Please report any improvements and regressions you observe on your hardware to the This email address is being protected from spambots. You need JavaScript enabled to view it. list. And keep an eye out for the next installments of this series!

Recent Comments
Tobias Kreidl
Turning off debugging will be very interesting to compare with the standard debug setting. I already have a set of benchmarks I ra... Read More
Wednesday, 18 June 2014 19:55
Tobias Kreidl
@James: Sure... Benchmarks with bonnie++ (V1.93c) on XenServer Creedence Alpha.2 with dom0 increased to 2 GB memory, otherwise th... Read More
Friday, 20 June 2014 04:30
Tobias Kreidl
Sure, here they are. The random operations seem to be worse for Alpha.2 in this case, but see an important note about this furthe... Read More
Monday, 23 June 2014 17:59
Continue reading
21263 Hits
11 Comments

XenServer Creedence Alpha 2 Released

We're pleased to announce that XenServer Creedence Alpha 2 has been released. Alpha 2 builds on the capabilities seen in Alpha 1, and we're interested your feedback on this release. With Alpha 1, we were primarily interested in receiving basic feedback on the stability of the code, with Alpha 2 we're interested in feedback not only on basic operations, but also storage performance.

The following functional enhancements are contained in Alpha 2.

  • Storage read caching. Boot storm conditions in environments using common templates can create unnecessary IO on shared storage systems. Storage read caching uses free dom0 memory to cache common read IO and reduce the impact of boot storms on storage networks and NAS devices.
  • DM Multipath storage support. For users of legacy MPP-RDAC, this functionality has been deprecated in XenServer Creedence following storage industry practices. If you are still using MPP-RDAC with XenServer 6.2 or prior, please enter an incident in https://bugs.xenserver.org to record your usage such that we can develop appropriate guidance.
  • Support for Ubuntu 14.04 and CentOS 5.10 as guest operating systems

The following performance improvements were observed with Alpha 2 compared to Alpha 1, but we'd like to hear your experiences.

  • GRO enabled physical network to guest network performance improved by 65%
  • Aggregate network throughput improved by 50%
  • Disk IO throughput improved by 100%

While these improvements are rather impressive, we do need to be aware this is alpha code. What this means in practice is that when we start looking at overall scalability the true performance numbers could go down a bit to ensure stable operations. That being said, if you have performance issues with this alpha we want to hear about them. Please also look to this blog space for updates from our performance engineering team detailing how some of these improvements were measured.

 

Please do download XenServer Creedence Alpha 2, and provide your feedback in our incident database.     

Recent Comments
Tobias Kreidl
In Creedence Alpha 1, we did not see any discernible storage performance difference compared to XS 6.2 SP1, so it will definitely ... Read More
Tuesday, 10 June 2014 14:42
James Bulpin
We'd very much like to move to CentOS 7 when it becomes available. However the timing of this, and the need to integrate and stabi... Read More
Friday, 13 June 2014 15:44
Bruno de Paula Larini
But there will be plans to support RHEL7/CentOS 7 guests?
Thursday, 26 June 2014 12:27
Continue reading
17479 Hits
17 Comments

The reality of a XenServer 64 bit dom0

One of the key improvements in XenServer Creedence is the introduction of a 64 bit control domain (dom0). This is something I've heard requests for over the past several years, and while there are many reasons to move to a 64 bit dom0, there are some equally good reasons for us to have waited this long to make the change. It's also important to distinguish between a 64 bit hypervisor, which XenServer has always used, and a 64 bit control domain which is the new bit.

Isn't XenServer 64 bit bare-metal virtualization?

The good news for people who think XenServer is 64 bit bare metal virtualization is they're not wrong. All versions of XenServer use the Xen Project hypervisor, and all versions of XenServer have configured that hypervisor to run in 64 bit mode. In a Xen Project based environment, the first thing the hypervisor does is load its control domain (dom0). For all versions of XenServer prior to Creedence, that control domain was 32 bit.

The goodness of 64 bit

A 64 bit Linux dom0 allows us to remove a number of bottlenecks which were arbitrarily present with a 32bit Linux dom0.

Goodbye low memory

One of the biggest limiting factors with a 32bit Linux dom0 is the concept of "low memory". On 32 bit computers, the maximum directly addressable memory is 4GB. There are a variety of extensions available to address beyond the 4GB limit, but they all come with a performance and complexity cost. Further, as 32bit Linux evolved, a concept of "Low" memory and "High" memory was created with most everything related to the kernel using low memory and userspace processes running in high memory. Since we're talking about kernel operations consuming low memory, that also means that any kernel memory maps and kernel drivers also consume low memory and you can see how low memory quickly becomes a precious commodity. It also can be increasingly consumed the more memory available to a 32bit Linux dom0. In a typical XenServer installation this value is only 752MB, and with the move to a 64bit dom0 will soon be a thing of the past.

Improved driver stability

In a 32bit BIOS, MMIO regions are always placed between the 1GB and 3GB physical address space, and with a 64bit BIOS they are always placed at the top of the physical memory. If an MMIO hole is created dom0 can choose to re-map that memory so that it is now shadowed in virtual address space. The kernel must map a roughly equal amount of memory to the size of the MMIO holes, which in a 32bit kernel must be done in low memory. Many drivers map kernel memory to MMIO holes on demand, resulting in boot time success but potential instability in the driver if there is insufficient low memory to satisfy the dynamic request for a MMIO hole remap. Additionally, while XenServer currently supports PAE, and can address more than 4GB, if the driver insists on having its memory allocated in 32bit physical address space, it will fail.

Support for more modern drivers

Hardware vendors want to ensure the broadest adoption for their devices, and with modern computers shipping with 64bit processors and memory configurations well in excess of 4GB, the majority of those drivers have been authored for 64bit operating systems, and 64bit Linux is no different. Moving to a 64bit dom0 gives us an opportunity to incorporate those newer devices and potentially have fewer restrictions on the quantity of devices in a system due to kernel memory usage.

Improved computational performance

During the initial wave of operating systems moving from 32bit to 64bit configurations, one of the often cited benefits was the computational improvements offered from such a migration. We fully intend for dom0 to take advantage of general purpose improvements offered from the processor no longer needing to operate in what is today more of a legacy configuration. One of the most immediate benefits is with a 64bit dom0, we can use 64bit compiler settings and take advantage of modern processor extensions. Additionally, there are several performance concessions (such as the quantity of event channels) and code paths which can be optimized when compiled for a 64bit system versus a 32bit system.

Challenges do exist, though

While there are clear benefits to a 64 bit migration, the move isn't without its set of issues as well. The most significant issue relates to XenServer pool communications. In a XenServer pool, all hosts have historically been required to be at the same version and patch level except during the process of upgrading the pool. This restriction serves to ensure that any data being passed between hosts, and configuration information being replicated in xenstore and all operations are consistent. With the introduction of Storage XenMotion in XenServer 6.1, this requirement was also extended to cross pool operations when a vdi is being migrated.

 

In essence you can think of the version requirement as being the insurance against bad things happening to your precious VM cargo. Unfortunately, that requirement has an assumption of dom0 having the same architecture and our migration from 32bit to 64bit complicates things. This is due to the various data structures used in pool operations having been designed with a consistent view of a world based on a 32bit operating system. This view of the world is directly challenged with a pool consisting of both 32bit and 64bit XenServer hosts as would be present during an upgrade. It's also challenged during cross pool storage live migration if one pool is 32bit while the second is 64bit. We are working to resolve this problem at least for the 32bit to 64bit upgrade, but it will be something we're going to want very active participation from our user community to test once we have completed our implementation efforts. After all we do subscribe to the notion that your virtual machines are pets and not cattle; regardless of the scale of your virtualized infrastructure.     

Recent Comments
chaitanya
Hi, Good concept!! so, will this be included in alpha.2? Chaitanya.
Friday, 06 June 2014 17:52
Tim Mackey
Chaitanya, Alpha.1 already has a 64 bit dom0, and with Alpha.2 we're bringing more improvements online. -tim... Read More
Tuesday, 10 June 2014 17:15
Tobias Kreidl
Great synopsis, Tim. This release is going to be the best ever.
Tuesday, 10 June 2014 14:38
Continue reading
32416 Hits
9 Comments

Patching XenServer at Scale

In January, I posted a how-to guide covering the installation of XenServer in a large scale environment, and this month we're going to talk about patching XenServer in a similar environment. Patching any operating environment is an important aspect of running a production installation, and XenServer is no different. Patching a XenServer manually can be done in one of two ways; either through XenCenter and its rolling pool upgrade option or via the CLI. The rolling pool upgrade wizard has been available since XenServer 6.0, and not only applies hotfixes to all the servers in a pool in the correct order, but also ensures any running VMs are migrated if reboots are required. If you prefer to apply the patches using the CLI, it becomes your responsibility to perform the VM migration, but the process is quite simple. XenServer customers with a Citrix support contract can utilize the rolling pool upgrade wizard, while free users have the option of manually patching using the CLI. Of course these two options can be used in a large scale environment, but generally the requirement is to script everything, and that's where this blog comes in.

Assumptions

The core assumption in the script in this blog is that the XenServer hosts are not in a pool. If the hosts are in a pool, then you should apply patches to the pool master first, and then any slaves. Since we're building on the environment in my previous blog which had standalone hosts, this assumption is valid.

Preparation Steps

  1. Download the desired hotfixes, patches and service packs from either citrix.com (http://support.citrix.com/product/xens/v6.2.0/) or xenserver.org (http://xenserver.org/overview-xenserver-open-source-virtualization/download.html)
  2. Extract the xsupdate file of each patch into a directory on an NFS share
  3. Test each patch to verify it works in your environment. While not required, I always like to do this because QA can't know every possible configuration and bugs do happen.
  4. Create a file named manifestand place it in the same directory as the xsupdate files. The manifest file will contain a single line for each patch, and those patches will be processed in order. An example manifest file is provided below, and any given line can be commented out using the hash (#) character.
    XS62E001.xsupdate
    XS62E002.xsupdate
    XS62E004.xsupdate
    XS62E005.xsupdate
    XS62E009.xsupdate
    XS62E010.xsupdate
    XS62E011.xsupdate
    XS62E012.xsupdate
    XS62ESP1.xsupdate
  5. Create a script file named apply-patches.shand place it in a known location. The contents of the script will be
    #!/bin/sh 
    # apply all XenServer patches which have been approved in our manifest
    
    mkdir /mnt/xshotfixes
    mount 192.168.98.3:/vol/exports/isolibrary/xs-hotfixes /mnt/xshotfixes
    
    
    HOSTNAME=$(hostname)
    HOSTUUID=$(xe host-list name-label=$HOSTNAME --minimal)
    while read PATCH
    do 
    if [ "$(echo "$PATCH" | head -c 1)" != '#' ]
    then 
    	PATCHNAME=$(echo "$PATCH" | awk -F: '{ split($1,a,"."); printf ("%s\n", a[1]); }')
    	echo "Processing $PATCHNAME"
    	PATCHUUID=$(xe patch-list name-label=$PATCHNAME hosts=$HOSTUUID --minimal)
    	if [ -z "$PATCHUUID" ]
    	then
    		echo "Patch not yet applied; applying .."
    		PATCHUUID=$(xe patch-upload file-name=/mnt/xshotfixes/$PATCH)
    		if [ -z "$PATCHUUID" ] #empty uuid means patch uploaded, but not applied to this host
    		then
    			PATCHUUID=$(xe patch-list name-label=$PATCHNAME --minimal)
    		fi
    		#apply the patch to *this* host only
    		xe patch-apply uuid=$PATCHUUID host-uuid=$HOSTUUID
    
    		# remove the patch files to avoid running out of disk space in the future
    		xe patch-clean uuid=$PATCHUUID 
    		
    		#figure out what the patch needs to be fully applied and then do it
    		PATCHACTIVITY=$(xe patch-list name-label=$PATCHNAME params=after-apply-guidance | sed -n 's/.*: \([.]*\)/\1/p')
    		if [ "$PATCHACTIVITY" == 'restartXAPI' ]
    		then
    			xe-toolstack-restart
    			# give time for the toolstack to restart before processing any more patches
    			sleep 60
    		elif [ "$PATCHACTIVITY" == 'restartHost' ]
    		then
    			# we need to rebot, but we may not be done.
    			# need to create a link to our script
    			
    			# first find out if we're already being run from a reboot
    			MYNAME="`basename \"$0\"`"
    			if [ "$MYNAME" == 'apply-patches.sh' ]
    			then
    				# I'm the base path so copy myself to the correct location
    				cp "$0" /etc/rc3.d/S99zzzzapplypatches  
    			fi
    			
    			reboot
    			exit
    		fi
    		
    	else
    		echo "$PATCHNAME already applied"
    	fi
    	
    fi
    done < "/mnt/xshotfixes/manifest"
    
    echo "done"
    umount /mnt/xshotfixes
    rmdir /mnt/xshotfixes
    
    # lastly if I'm running as part of a reboot; kill the link
    rm -f /etc/rc3.d/S99zzzzapplypatches 

Applying Patches

Applying patches is as simple as running the script file and letting it do what it needs to do. Here's how it works...

  1. We need to find out if the patch has already been applied.
  2. If the patch hasn't been applied, we want to upload it and then apply it. Since any given patch might require the toolstack to be restarted, we check for that and restart the toolstack. Additionally we need to handle the case where the patch might require a reboot. If that's the case, we want to reboot, but also might need to process additional patches. To account for that, we'll insert ourself into the reboot sequence to keep processing more patches until we've reached the end.
  3. Since we want to be sensitive to disk space usage, we'll cleanup the patch files once each patch has been applied.

 

This script becomes quite valuable when used in conjunction with the provisioning script in my blog on installing XenServer at scale. Simply copy the patch script to /etc/rc3.d/S99zzzzapplypatches and add that command to first-boot-script.sh prior to the final reboot. With the combination of these two scripts, you now can install XenServer at scale, and ensure those newly installed XenServer hosts are fully patched from the beginning.     

Recent Comments
Tim Mackey
Thanks Matthew. The HTML tidy code in the blog editor ate it.
Wednesday, 19 March 2014 15:48
Anthony
Hi Tim Mackey, Thanks a great deal for you hard work, just the script I've been looking for. I'm a newby in bash shell scrip... Read More
Wednesday, 26 March 2014 17:24
Tim Mackey
Anthony, For this script the only thing you'd want to change is the IP address of the NFS server and the export point where you p... Read More
Tuesday, 01 April 2014 14:32
Continue reading
46830 Hits
27 Comments

XenServer Status - October 2013

This past month saw some significant progress toward our objective of converting XenServer from a closed source product developed within Citrix to an open source project.  This is a process which is considerably more difficult and detailed than simply announcing that we’re now open source, and I’m pleased to announce that in October we completed the publication of all sources making up XenServer.  While there is considerable work left to be done, not only can interested parties view all of the code, but we have also posted nightly snapshots of the last thirty builds from trunk.  For organizations looking to integrate with XenServer, these builds represent an ideal early access program from which to test integrations.

In addition to the code progress, we’ve also been busy building capabilities and supporting a vibrant ecosystem.  Some of the highlights include:

Now no status report would be complete without some metrics, and we’ve got some pretty decent stats as well.  Unique visitors to xenserver.org in October were up 12% to over 26,000.  Downloads of the core XenServer installation ISO directly from xenserver.org were up by over 1000 downloads.  Mailing list activity was up 50% and we had over 80 commits to the XenServer repositories.  What’s even more impressive with these numbers is that XenServer is built from a number of other open source projects, so the real activity level within XenServer is considerably larger.

At the end of the day this is one month, but it is a turning point.  I’ve been associated in one form or another with XenServer since 2008, and even way back then there were many who expected XenServer was unlikely to be around for long.  Five years later there are more competing solutions, but the future for XenServer is as solid as ever.  We’re working through some of the technical issues which have artificially limited XenServer in recent years, but we are making significant progress.  If you are looking for a solid, high performance, open source virtualization platform; then XenServer needs to be on your list.  If you are looking to contain the costs of delivering virtualized infrastructure, the same holds true.  

More important than all these excellent steps forward is how XenServer can benefit the ecosystem of vendors and fellow open source projects which are required to fully deliver virtualized infrastructure at large scale.  Over the next several months I’m going to be reaching out to various constituencies to see what we should be doing to make participating in the ecosystem more valuable.  If you want to be included in that process, please let me know.

Continue reading
14304 Hits
0 Comments

How did we increase VM density in XenServer 6.2? (part 2)

In a previous article, I described how dom0 event channels can cause a hard limitation on VM density scalability.

Event channels were just one hard limit the XenServer engineering team needed to overcome to allow XenServer 6.2 to support up to 500 Windows VMs or 650 Linux VMs on a single host.

In my talk at the 2013 Xen Developer Summit towards the end of October, I spoke about a further six hard limits and some soft limits that we overcame along the way to achieving this goal. This blog article summarises that journey.

Firstly, I'll explain what I mean by hard and soft VM density limits. A hard limit is where you can run a certain number of VMs without any trouble, but you are unable to run one more. Hard limits arise when there is some finite, unsharable resource that each VM consumes a bit of. On the other hand, a soft limit is where performance degrades with every additional VM you have running; there will be a point at which it's impractical to run more than a certain number of VMs because they will be unusable in some sense. Soft limits arise when there is a shared resource that all VMs must compete for, such as CPU time.

Here is a run-down of all seven hard limits, how we mitigated them in XenServer 6.2, and how we might be able to push them even further back in future:

  1. dom0 event channels

    • Cause of limitation: XenServer uses a 32-bit dom0. This means a maximum of 1,024 dom0 event channels.
    • Mitigation for XenServer 6.2: We made a special case for dom0 to allow it up to 4,096 dom0 event channels.
    • Mitigation for future: Adopt David Vrabel's proposed change to the Xen ABI to provide unlimited event channels.
  2. blktap2 device minor numbers

    • Cause of limitation: blktap2 only supports up to 1,024 minor numbers, caused by #define MAX_BLKTAP_DEVICE in blktap.h.
    • Mitigation for XenServer 6.2: We doubled that constant to allow up to 2,048 devices.
    • Mitigation for future: Move away from blktap2 altogether?
  3. aio requests in dom0

    • Cause of limitation: Each blktap2 instance creates an asynchronous I/O context for receiving 402 events; the default system-wide number of aio requests (fs.aio-max-nr) was 444,416 in XenServer 6.1.
    • Mitigation for XenServer 6.2: We set fs.aio-max-nr to 1,048,576.
    • Mitigation for future: Increase this parameter yet further. It's not clear whether there's a ceiling, but it looks like this would be okay.
  4. dom0 grant references

    • Cause of limitation: Windows VMs used receive-side copy (RSC) by default in XenServer 6.1. In netbk_p1_setup, netback allocates 22 grant-table entries per virtual interface for RSC. But dom0 only had a total of 8,192 grant-table entries in XenServer 6.1.
    • Mitigation for XenServer 6.2: We could have increased the size of the grant-table, but for other reasons RSC is no longer the default for Windows VMs in XenServer 6.2, so this limitation no longer applies.
    • Mitigation for future: Continue to leave RSC disabled by default.
  5. Connections to xenstored

    • Cause of limitation: xenstored uses select(2), which can only listen on up to 1,024 file descriptors; qemu opens 3 file descriptors to xenstored.
    • Mitigation for XenServer 6.2: We made two qemu watches share a connection.
    • Mitigation for future: We could modify xenstored to accept more connections, but in the future we expect to be using upstream qemu, which doesn't connect to xenstored, so it's unlikely that xenstored will run out of connections.
  6. Connections to consoled

    • Cause of limitation: Similarly, consoled uses select(2), and each PV domain opens 3 file descriptors to consoled.
    • Mitigation for XenServer 6.2: We use poll(2) rather than select(2). This has no such limitation.
    • Mitigation for future: Continue to use poll(2).
  7. dom0 low memory

    • Cause of limitation: Each running VM eats about 1 MB of dom0 low memory.
    • Mitigation for future: Using a 64-bit dom0 would remove this limit.

Summary of limits

Okay, so what does this all mean in terms of how many VMs you can run on a host? Well, since some of the limits concern your VM configuration, it depends on the type of VM you have in mind.

Let's take the example of Windows VMs with PV drivers, each with 1 vCPU, 3 disks and 1 network interface. Here are the number of those VMs you'd have to run on a host in order to hit each limitation:

Limitation XS 6.1 XS 6.2 Future
dom0 event channels 150 570 no limit
blktap minor numbers 341 682 no limit
aio requests 368 869 no limit
dom0 grant references 372 no limit no limit
xenstored connections 333 500 no limit
consoled connections no limit no limit no limit
dom0 low memory 650 650 no limit

The first limit you'd arrive at in each release is highlighted. So the overall limit is event channels in XenServer 6.1, limiting us to 150 of these VMs. In XenServer 6.2, it's the number of xenstore connections that limits us to 500 VMs per host. In the future, none of these limits will hit us, but there will surely be an eighth limit when running many more than 500 VMs on a host.

What about Linux guests? Here's where we stand for paravirtualised Linux VMs each with 1 vCPU, 1 disk and 1 network interface:

Limitation XS 6.1 XS 6.2 Future
dom0 event channels 225 1000 no limit
blktap minor numbers 1024 2048 no limit
aio requests 368 869 no limit
dom0 grant references no limit no limit no limit
xenstored connections no limit no limit no limit
consoled connections 341 no limit no limit
dom0 low memory 650 650 no limit

This explains why the supported limit for Linux guests can be as high as 650 in XenServer 6.2. Again, in the future, we'll likely be limited by something else above 650 VMs.

What about the soft limits?

After having pushed the hard limits such a long way out, we then needed to turn our attention towards ensuring that there weren't any soft limits that would make it infeasible to run a large number of VMs in practice.

Felipe Franciosi has already described how qemu's utilisation of dom0 CPUs can be reduced by avoiding the emulation of unneeded virtual devices. The other major change in XenServer 6.2 to reduce dom0 load was to reduce the amount of xenstore traffic. This was achieved by replacing code that polled xenstore with code that registers watches on xenstore and by removing some spurious xenstore accesses from the Windows guest agent.

These things combine to keep dom0 CPU load down to a very low level. This means that VMs can remain healthy and responsive, even when running a very large number of VMs.

Recent comment in this post
Tobias Kreidl
We see xenstored eat anywhere from 30 to 70% of a CPU with something like 80 VMs running under XenServer 6.1. When major updates t... Read More
Wednesday, 13 November 2013 17:10
Continue reading
26897 Hits
1 Comment

How did we increase VM density in XenServer 6.2?

One of the most noteworthy improvements in XenServer 6.2 is the support for a significantly increased number of VMs running on a host: now up to 500 Windows VMs or 650 Linux VMs.

We needed to remove several obstacles in order to achieve this huge step up. Perhaps the most important of the technical changes that led to this was to increase in the number of event channels available to dom0 (the control domain) from 1024 to 4096. This blog post is an attempt to shed some light on what these event channels are, and why they play a key role in VM density limits.

What is an event channel?

It's a channel for communications between a pair of VMs. An event channel is typically used by one VM to notify another VM about something. For example, a VM's paravirtualised disk driver would use an event channel to notify dom0 of the presence of newly written data in a region of memory shared with dom0.

Here are the various things that a VM requires an event channel for:

  • one per virtual disk;
  • one per virtual network interface;
  • one for communications with xenstore;
  • for HVM guests, one per virtual CPU (rising to two in XenServer 6.2); and
  • for PV guests; one to communicate with the console daemon.


Therefore VMs will typically require at least four dom0 event channels depending on the configuration of the VM. Requiring more than ten is not an uncommon configuration.

Why can event channels cause scalability problems when trying to run lots of VMs?

The total number of event channels any domain can use is part of a shared structure in the interface between a paravirtualised VM and the hypervisor; it is fixed at 1024 for 32-bit domains such as XenServer's dom0. Moreover, there are normally around 50--100 event channels used for other purposes, such as physical interrupts. This is normally related to the number of physical devices you have in your host. This overhead means that in practice there might be not too many more than 900--950 event channels available for VM use. So the number of available event channels becomes a limited resource that can cause you to experience a hard limit on the number of VMs you can run on a host.

To take an example: Before XenServer 6.2, if each of your VMs requires 6 dom0 event channels (e.g. an HVM guest with 3 virtual disks, 1 virtual network interface and 1 virtual CPU) then you'll probably find yourself running out of dom0 event channels if you go much over 150 VMs.

In XenServer 6.2, we have made a special case for our dom0 to allow it to behave differently to other 32-bit domains to allow it to use up to four times the normal number of event channels. Hence there are now a total of 4096 event channels available.

So, on XenServer 6.2 in the same scenario as the example above, even though each VM of this type would now use 7 dom0 event channels, the increased total number of dom0 event channels means you'd have to run over 570 of them before running out.

What happens when I run out of event channels?

On VM startup, the XenServer toolstack will try to plumb all the event channels through from dom0 to the nascent VM. If there are no spare slots, the connection will fail. The exact failure mode depends on which subsystem the event channel was intended for use in, but you may see error messages like these when the toolstack tries to connect up the next event channel after having run out:

error 28 mapping ring-refs and evtchn
message: xenopsd internal error: Device.Ioemu_failed("qemu-dm exited unexpectedly")

In other words, it's not pretty. The VM either won't boot or will run with reduced functionality.

That sounds scary. How can I tell whether there's sufficient spare event channels to start another VM?

XenServer has a utility called "lsevtchn" that allows you to inspect the event channel plumbing.

In dom0, run the following command to see what event channels are connected to a particular domain.

/usr/lib/xen/bin/lsevtchn 

For example, here is the output from a PV domain with domid 36:

[root@xs62 ~]# /usr/lib/xen/bin/lsevtchn 36
   1: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 51
   2: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 52
   3: VCPU 0: Virtual IRQ 0
   4: VCPU 0: IPI
   5: VCPU 0: IPI
   6: VCPU 0: Virtual IRQ 1
   7: VCPU 0: IPI
   8: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 55
   9: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 53
  10: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 54
  11: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 56

You can see that six of this VM's event channels are connected to dom0.

But the domain we are most interested in is dom0. The total number of event channels connected to dom0 can be determined by running

/usr/lib/xen/bin/lsevtchn 0 | wc -l

Before XenServer 6.2, if that number is close to 1024 then your host is on the verge of not being able to run an additional VM. On XenServer 6.2, the number to watch out for is 4096. However, before you'd be able to get enough VMs up and running to approach that limit, there are various other things you might run into depending on configuration and workload. Watch out for further blog posts describing how we have cleared more of these hurdles in XenServer 6.2.

Continue reading
69012 Hits
0 Comments

About XenServer

XenServer is the leading open source virtualization platform, powered by the Xen Project hypervisor and the XAPI toolstack. It is used in the world's largest clouds and enterprises.
 
Technical support for XenServer is available from Citrix.