Virtualization Blog

Discussions and observations on virtualization.
Jonathan is a Principal Software Engineer at Citrix where he is the lead engineer for XenServer's Performance Team. This team has oversight of the performance and scalability of all aspects of XenServer.

XenServer 7.0 performance improvements part 4: Aggregate I/O throughput improvements

The XenServer team has made a number of significant performance and scalability improvements in the XenServer 7.0 release. This is the fourth in a series of articles that will describe the principal improvements. For the previous ones, see:

  1. http://xenserver.org/blog/entry/dundee-tapdisk3-polling.html
  2. http://xenserver.org/blog/entry/dundee-networking-multi-queue.html
  3. http://xenserver.org/blog/entry/dundee-parallel-vbd-operations.html

In this article we return to the theme of I/O throughput. Specifically, we focus on improvements to the total throughput achieved by a number of VMs performing I/O concurrently. Measurements show that XenServer 7.0 enjoys aggregate network throughput over three times faster than XenServer 6.5, and also has an improvement to aggregate storage throughput.

What limits aggregate I/O throughput?

When a number of VMs are performing I/O concurrently, the total throughput that can be achieved is often limited by dom0 becoming fully busy, meaning it cannot do any additional work per unit time. The I/O backends (netback for network I/O and tapdisk3 for storage I/O) together consume 100% of available dom0 CPU time.

How can this limit be overcome?

Whenever there is a CPU bottleneck like this, there are two possible approaches to improving the performance:

  1. Reduce the amount of CPU time required to perform I/O.
  2. Increase the processing capacity of dom0, by giving it more vCPUs.

Surely approach 2 is easy and will give a quick win...? Intuitively, we might expect the total throughput to increase proportionally with the number of dom0 vCPUs.

Unfortunately it's not as straightforward as that. The following graph shows what happened to the aggregate network throughput on XenServer 6.5 if the number of dom0 vCPUs is artificially increased. (In this case, we are measuring the total network throughput of 40 VMs communicating amongst themselves on a single Dell R730 host.)

b2ap3_thumbnail_5179.png

Counter-intuitively, the aggregate throughput decreases as we add more processing power to dom0! (This explains why the default was at most 8 vCPUs in XenServer 6.5.)

So is there no hope for giving dom0 more processing power...?

The explanation for the degradation in performance is that certain operations run more slowly when there are more vCPUs present. In order to make dom0 work better with more vCPUs, we needed to understand what those operations are, and whether they can be made to scale better.

Three such areas of poor scalability were discovered deep in the innards of Xen by Malcolm Crossley and David Vrabel, and improvements were made for each:

  1. Maptrack lock contention – improved by http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=dff515dfeac4c1c13422a128c558ac21ddc6c8db
  2. Grant-table lock contention – improved by http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=b4650e9a96d78b87ccf7deb4f74733ccfcc64db5
  3. TLB flush on grant-unmap – improved by https://github.com/xenserver/xen-4.6.pg/blob/master/master/avoid-gnt-unmap-tlb-flush-if-not-accessed.patch

The result of improving these areas is dramatic – see the green line in the following graph:

b2ap3_thumbnail_4190.png

Now, throughput scales very well as the number of vCPUs increases. This means that, for the first time, it is now beneficial to allocate many vCPUs to dom0 – so that when there is demand, dom0 can deliver. Hence we have given XenServer 7.0 a higher default number of dom0 vCPUs.

How many vCPUs are now allocated to dom0 by default?

Most hosts will now get 16 vCPUs by default, but the exact number depends on the number of CPU cores on the host. The following graph summarises how the default number of dom0 vCPUs is calculated from the number of CPU cores on various current and historic XenServer releases:

b2ap3_thumbnail_dom0-vcpus.png

Summary of improvements

I will conclude with some aggregate I/O measurements comparing XenServer 6.5 and 7.0 under default settings (no dom0 configuration changes) on a Dell R730xd.

  1. Aggregate network throughput – twenty pairs of 32-bit Debian 6.0 VMs sending and receiving traffic generated with iperf 2.0.5.
    b2ap3_thumbnail_aggr-intrahost-r730_20160729-093608_1.png
  2. Aggregate storage IOPS – twenty 32-bit Windows 7 SP1 VMs each doing single-threaded, serial, sequential 4KB reads with fio to a virtual disk on an Intel P3700 NVMe drive.
    b2ap3_thumbnail_storage-iops-aggr-p3700-win7.png
Continue reading
6257 Hits
2 Comments

XenServer 7.0 performance improvements part 2: Parallelised networking datapath

The XenServer team has made a number of significant performance and scalability improvements in the XenServer 7.0 release. This is the second in a series of articles that will describe the principal improvements. For the first, see http://xenserver.org/blog/entry/dundee-tapdisk3-polling.html.

The topic of this post is network I/O performance. XenServer 7.0 achieves significant performance improvements through the support for multi-queue paravirtualised network interfaces. Measurements of one particular use-case show an improvement from 17 Gb/s to 41 Gb/s.

A bit of background about the PV network datapath

In order to perform network-based communications, a VM employs a paravirtualised network driver (netfront in Linux or xennet in Windows) in conjunction with netback in the control domain, dom0.

a1sx2_Original2_single-queue.png

To the guest OS, the netfront driver feels just like a physical network device. When a guest wants to transmit data:

  • Netfront puts references to the page(s) containing that data into a "Transmit" ring buffer it shares with dom0.
  • Netback in dom0 picks up these references and maps the actual data from the guest's memory so it appears in dom0's address space.
  • Netback then hands the packet to the dom0 kernel, which uses normal routing rules to determine that it should go to an Open vSwitch device and then on to either a physical interface or the netback device for another guest on the same host.

When dom0 has a network packet it needs to send to the guest, the reverse procedure applies, using a separate "Receive" ring.

Amongst the factors that can limit network throughput are:

  1. the ring becoming full, causing netfront to have to wait before more data can be sent, and
  2. the netback process fully consuming an entire dom0 vCPU, meaning it cannot go any faster.

Multi-queue alleviates both of these potential bottlenecks.

What is multi-queue?

Rather than having a single Transmit and Receive ring per virtual interface (VIF), multi-queue means having multiple Transmit and Receive rings per VIF, and one netback thread for each:

a1sx2_Original1_multi-queue.png

Now, each TCP stream has the opportunity to be driven through a different Transmit or Receive ring. The particular ring chosen for each stream is determined by a hash of the TCP header (MAC, IP and port number of both the source and destination).

Crucially, this means that separate netback threads can work on each TCP stream in parallel. So where we were previously limited by the capacity of a single dom0 vCPU to process packets, now we can exploit several dom0 vCPUs. And where the capacity of a single Transmit ring limited the total amount of data in-flight, the system can now support a larger amount.

Which use-cases can take advantage of multi-queue?

Anything involving multiple TCP streams. For example, any kind of server VM that handles connections from more than one client at the same time.

Which guests can use multi-queue?

Since frontend changes are needed, the version of the guest's netfront driver matters. Although dom0 is geared up to support multi-queue, guests with old versions of netfront that lack multi-queue support are limited to single Transmit and Receive rings.

  • For Windows, the XenServer 7.0 xennet PV driver supports multi-queue.
  • For Linux, multi-queue support was added in Linux 3.16. This means that Debian Jessie 8.0 and Ubuntu 14.10 (or later) support multi-queue with their stock kernels. Over time, more and more distributions will pick up the relevant netfront changes.

How does the throughput scale with an increasing number of rings?

The following graph shows some measurements I made using iperf 2.0.5 between a pair of Debian 8.0 VMs both on a Dell R730xd host. The VMs each had 8 vCPUs, and iperf employed 8 threads each generating a separate TCP stream. The graph reports the sum of the 8 threads' throughputs, varying the number of queues configured on the guests' VIFs.

5104.png

We can make several observations from this graph:

  • The throughput scales well up to four queues, with four queues achieving more than double the throughput possible with a single queue.
  • The blip at five queues probably arose when the hashing algorithm failed to spread the eight TCP streams evenly across the queues, and is thus a measurement artefact. With different TCP port numbers, this may not have happened.
  • While the throughput generally increases with an increasing number of queues, the throughput is not proportional to the number of rings. Ideally, the throughput would double when you double the number of rings. This doesn't happen in practice because the processing is not perfectly parallelisable: netfront needs to demultiplex the streams onto the rings, and there are some overheads due to locking and synchronisation between queues.

This graph also highlights the substantial improvement over XenServer 6.5, in which only one queue per VIF was supported. In this use-case of eight TCP streams, XenServer 7.0 achieves 41 Gb/s out-of-the-box where XenServer 6.5 could manage only 17 Gb/s – an improvement of 140%.

How many rings do I get by default?

By default the number of queues is limited by (a) the number of vCPUs the guest has and (b) the number of vCPUs dom0 has. A guest with four vCPUs will get four queues per VIF.

This is a sensible default, but if you want to manually override it, you can do so in the guest. In a Linux guest, add the parameter xen_netfront.max_queues=n, for some n, to the kernel command-line.

Recent Comments
Tobias Kreidl
Hi, Jonathan: Thanks for the insightful pair of articles. It's interesting how what appear to be nuances can make large performan... Read More
Tuesday, 21 June 2016 04:54
Jonathan Davies
Thanks for sharing your thoughts, Tobias. You ask about queue polling. In fact, netback already does this! It achieves this by us... Read More
Wednesday, 22 June 2016 08:40
Sam McLeod
Interesting post Jonathan, I've tried adjusting `xen_netfront.max_queues` amongst other similar values on both guests and hosts h... Read More
Tuesday, 21 June 2016 13:01
Continue reading
7651 Hits
5 Comments

XenServer 7.0 performance improvements part 1: Lower latency storage datapath

The XenServer team made a number of significant performance and scalability improvements in the XenServer 7.0 release. This is the first in a series of articles that will describe the principal improvements.

Our first topic is storage I/O performance. A performance improvement has been achieved through the adoption of a polling technique in tapdisk3, the component of XenServer responsible for handling I/O on virtual storage devices. Measurements of one particular use-case demonstrate a 50% increase in performance from 15,000 IOPS to 22,500 IOPS.

What is polling?

Normally, tapdisk3 operates in an event-driven manner. Here is a summary of the first few steps required when a VM wants to do some storage I/O:

  1. The VM's paravirtualised storage driver (called blkfront in Linux or xenvbd in Windows) puts a request in the ring it shares with dom0.
  2. It sends tapdisk3 a notification via an event-channel.
  3. This notification is delivered to domain 0 by Xen as an interrupt. If Domain 0 is not running, it will need to be scheduled in order to receive the interrupt.
  4. When it receives the interrupt, the domain 0 kernel schedules the corresponding backend process to run, tapdisk3.
  5. When tapdisk3 runs, it looks at the contents of the shared-memory ring.
  6. Finally, tapdisk3 finds the request which can then be transformed into a physical I/O request.

Polling is an alternative to this approach in which tapdisk3 repeatedly looks in the ring, speculatively checking for new requests. This means that steps 2–4 can be skipped: there's no need to wait for an event-channel interrupt, nor to wait for the tapdisk3 process to be scheduled: it's already running. This enables tapdisk3 to pick up the request much more promptly as it avoids these delays inherent to the event-driven approach.

The following diagram contrasts the timelines of these alternative approaches, showing how polling reduces the time until the request is picked up by the backend.

b2ap3_thumbnail_polling-explained.png

How does polling help improve storage I/O performance?

Polling is in established technique for reducing latency in event-driven systems. (One example of where it is used elsewhere to mitigate interrupt latency is in Linux networking drivers that use NAPI.)

Servicing I/O requests promptly is an essential part of optimising I/O performance. As I discussed in my talk at the 2015 Xen Project Developer Summit, reducing latency is the key to maintaining a low virtualisation overhead. As physical I/O devices get faster and faster, any latency incurred in the virtualisation layer becomes increasingly noticeable and translates into lower throughputs.

An I/O request from a VM has a long journey to physical storage and back again. Polling in tapdisk3 optimises one section of that journey.

Isn't polling really CPU intensive, and thus harmful?

Yes it is, so we need to handle it carefully. If left unchecked, polling could easily eat huge quantities of domain 0 CPU time, starving other processes and causing overall system performance to drop.

We have chosen to do two things to avoid consuming too much CPU time:

  1. Poll the ring only when there's a good chance of a request appearing. Of course, guest behaviour is totally unpredictable in general, but there are some principles that can increase our chances of polling at the right time. For example, one assumption we adopt is that it's worth polling for a short time after the guest issues an I/O request. It has issued one request, so there's a good chance that it will issue another soon after. And if this guess turns out to be correct, keep on polling for a bit longer in case any more turn up. If there are none for a while, stop polling and temporarily fall back to the event-based approach.
  2. Don't poll if domain 0 is already very busy. Since polling is expensive in terms of CPU cycles, we only enter the polling loop if we are sure that it won't starve other processes of CPU time they may need.

How much faster does it go?

The benefit you will get from polling depends primarily on the latency of your physical storage device. If you are using an old mechanical hard-drive or an NFS share on a server on the other side of the planet, shaving a few microseconds off the journey through the virtualisation layer isn't going to make much of a difference. But on modern devices and low-latency network-based storage, polling can make a sizeable difference. This is especially true for smaller request sizes since these are most latency-sensitive.

For example, the following graph shows an improvement of 50% in single-threaded sequential read I/O for small request sizes – from 15,000 IOPS to 22,500 IOPS. These measurements were made with iometer in a 32-bit Windows 7 SP1 VM on a Dell PowerEdge R730xd with an Intel P3700 NVMe drive.

b2ap3_thumbnail_5071.png

How was polling implemented?

The code to add polling to tapdisk3 can be found in the following set of commits: https://github.com/xapi-project/blktap/pull/179/commits.

Continue reading
9406 Hits
0 Comments

XenServer's LUN scalability

"How many VMs can coexist within a single LUN?"

An important consideration when planning a deployment of VMs on XenServer is around the sizing of your storage repositories (SRs). The question above is one I often hear. Is the performance acceptable if you have more than a handful of VMs in a single SR? And will some VMs perform well while others suffer?

In the past, XenServer's SRs didn't always scale too well, so it was not always advisable to cram too many VMs into a single LUN. But all that changed in XenServer 6.2, allowing excellent scalability up to very large numbers of VMs. And the subsequent 6.5 release made things even better.

The following graph shows the total throughput enjoyed by varying numbers of VMs doing I/O to their VDIs in parallel, where all VDIs are in a single SR.

3541.png

In XenServer 6.1 (blue line), a single VM would experience modest 240 MB/s. But, counter-intuitively, adding more VMs to the same SR would cause the total to fall, reaching a low point around 20 VMs achieving a total of only 30 MB/s – an average of only 1.5 MB/s each!

On the other hand, in XenServer 6.5 (red line), a single VM achieves 600 MB/s, and it only requires three or four VMs to max out the LUN's capabilities at 820 MB/s. Crucially, adding further VMs no longer causes the total throughput to fall, but remains constant at the maximum rate.

And how well distributed was the available throughput? Even with 100 VMs, the available throughput was spread very evenly -- on XenServer 6.5 with 100 VMs in a LUN, the highest average throughput achieved by a single VM was only 2% greater than the lowest. The following graph shows how consistently the available throughput is distributed amongst the VMs in each case:

4016.png

Specifics

  • Host: Dell R720 (2 x Xeon E5-2620 v2 @ 2.1 GHz, 64 GB RAM)
  • SR: Hardware HBA using FibreChannel to a single LUN on a Pure Storage 420 SAN
  • VMs: Debian 6.0 32-bit
  • I/O pattern in each VM: 4 MB sequential reads (O_DIRECT, queue-depth 1, single thread). The graph above has a similar shape for smaller block sizes and for writes.
Recent Comments
Tobias Kreidl
Very nice, Jonathan, and it is always good to raise discussions about standards that are known to change over time. This is partic... Read More
Friday, 26 June 2015 19:52
Tobias Kreidl
Indeed, depending on the specific characteristics of each storage array there will be some maximum queue depth per connection (por... Read More
Saturday, 27 June 2015 04:27
Jonathan Davies
Thanks for your comments, Tobias and John. You're absolutely right -- the LUN's capabilities are an important consideration. And n... Read More
Monday, 29 June 2015 08:53
Continue reading
13837 Hits
6 Comments

How did we increase VM density in XenServer 6.2? (part 2)

In a previous article, I described how dom0 event channels can cause a hard limitation on VM density scalability.

Event channels were just one hard limit the XenServer engineering team needed to overcome to allow XenServer 6.2 to support up to 500 Windows VMs or 650 Linux VMs on a single host.

In my talk at the 2013 Xen Developer Summit towards the end of October, I spoke about a further six hard limits and some soft limits that we overcame along the way to achieving this goal. This blog article summarises that journey.

Firstly, I'll explain what I mean by hard and soft VM density limits. A hard limit is where you can run a certain number of VMs without any trouble, but you are unable to run one more. Hard limits arise when there is some finite, unsharable resource that each VM consumes a bit of. On the other hand, a soft limit is where performance degrades with every additional VM you have running; there will be a point at which it's impractical to run more than a certain number of VMs because they will be unusable in some sense. Soft limits arise when there is a shared resource that all VMs must compete for, such as CPU time.

Here is a run-down of all seven hard limits, how we mitigated them in XenServer 6.2, and how we might be able to push them even further back in future:

  1. dom0 event channels

    • Cause of limitation: XenServer uses a 32-bit dom0. This means a maximum of 1,024 dom0 event channels.
    • Mitigation for XenServer 6.2: We made a special case for dom0 to allow it up to 4,096 dom0 event channels.
    • Mitigation for future: Adopt David Vrabel's proposed change to the Xen ABI to provide unlimited event channels.
  2. blktap2 device minor numbers

    • Cause of limitation: blktap2 only supports up to 1,024 minor numbers, caused by #define MAX_BLKTAP_DEVICE in blktap.h.
    • Mitigation for XenServer 6.2: We doubled that constant to allow up to 2,048 devices.
    • Mitigation for future: Move away from blktap2 altogether?
  3. aio requests in dom0

    • Cause of limitation: Each blktap2 instance creates an asynchronous I/O context for receiving 402 events; the default system-wide number of aio requests (fs.aio-max-nr) was 444,416 in XenServer 6.1.
    • Mitigation for XenServer 6.2: We set fs.aio-max-nr to 1,048,576.
    • Mitigation for future: Increase this parameter yet further. It's not clear whether there's a ceiling, but it looks like this would be okay.
  4. dom0 grant references

    • Cause of limitation: Windows VMs used receive-side copy (RSC) by default in XenServer 6.1. In netbk_p1_setup, netback allocates 22 grant-table entries per virtual interface for RSC. But dom0 only had a total of 8,192 grant-table entries in XenServer 6.1.
    • Mitigation for XenServer 6.2: We could have increased the size of the grant-table, but for other reasons RSC is no longer the default for Windows VMs in XenServer 6.2, so this limitation no longer applies.
    • Mitigation for future: Continue to leave RSC disabled by default.
  5. Connections to xenstored

    • Cause of limitation: xenstored uses select(2), which can only listen on up to 1,024 file descriptors; qemu opens 3 file descriptors to xenstored.
    • Mitigation for XenServer 6.2: We made two qemu watches share a connection.
    • Mitigation for future: We could modify xenstored to accept more connections, but in the future we expect to be using upstream qemu, which doesn't connect to xenstored, so it's unlikely that xenstored will run out of connections.
  6. Connections to consoled

    • Cause of limitation: Similarly, consoled uses select(2), and each PV domain opens 3 file descriptors to consoled.
    • Mitigation for XenServer 6.2: We use poll(2) rather than select(2). This has no such limitation.
    • Mitigation for future: Continue to use poll(2).
  7. dom0 low memory

    • Cause of limitation: Each running VM eats about 1 MB of dom0 low memory.
    • Mitigation for future: Using a 64-bit dom0 would remove this limit.

Summary of limits

Okay, so what does this all mean in terms of how many VMs you can run on a host? Well, since some of the limits concern your VM configuration, it depends on the type of VM you have in mind.

Let's take the example of Windows VMs with PV drivers, each with 1 vCPU, 3 disks and 1 network interface. Here are the number of those VMs you'd have to run on a host in order to hit each limitation:

Limitation XS 6.1 XS 6.2 Future
dom0 event channels 150 570 no limit
blktap minor numbers 341 682 no limit
aio requests 368 869 no limit
dom0 grant references 372 no limit no limit
xenstored connections 333 500 no limit
consoled connections no limit no limit no limit
dom0 low memory 650 650 no limit

The first limit you'd arrive at in each release is highlighted. So the overall limit is event channels in XenServer 6.1, limiting us to 150 of these VMs. In XenServer 6.2, it's the number of xenstore connections that limits us to 500 VMs per host. In the future, none of these limits will hit us, but there will surely be an eighth limit when running many more than 500 VMs on a host.

What about Linux guests? Here's where we stand for paravirtualised Linux VMs each with 1 vCPU, 1 disk and 1 network interface:

Limitation XS 6.1 XS 6.2 Future
dom0 event channels 225 1000 no limit
blktap minor numbers 1024 2048 no limit
aio requests 368 869 no limit
dom0 grant references no limit no limit no limit
xenstored connections no limit no limit no limit
consoled connections 341 no limit no limit
dom0 low memory 650 650 no limit

This explains why the supported limit for Linux guests can be as high as 650 in XenServer 6.2. Again, in the future, we'll likely be limited by something else above 650 VMs.

What about the soft limits?

After having pushed the hard limits such a long way out, we then needed to turn our attention towards ensuring that there weren't any soft limits that would make it infeasible to run a large number of VMs in practice.

Felipe Franciosi has already described how qemu's utilisation of dom0 CPUs can be reduced by avoiding the emulation of unneeded virtual devices. The other major change in XenServer 6.2 to reduce dom0 load was to reduce the amount of xenstore traffic. This was achieved by replacing code that polled xenstore with code that registers watches on xenstore and by removing some spurious xenstore accesses from the Windows guest agent.

These things combine to keep dom0 CPU load down to a very low level. This means that VMs can remain healthy and responsive, even when running a very large number of VMs.

Recent comment in this post
Tobias Kreidl
We see xenstored eat anywhere from 30 to 70% of a CPU with something like 80 VMs running under XenServer 6.1. When major updates t... Read More
Wednesday, 13 November 2013 17:10
Continue reading
24805 Hits
1 Comment

How did we increase VM density in XenServer 6.2?

One of the most noteworthy improvements in XenServer 6.2 is the support for a significantly increased number of VMs running on a host: now up to 500 Windows VMs or 650 Linux VMs.

We needed to remove several obstacles in order to achieve this huge step up. Perhaps the most important of the technical changes that led to this was to increase in the number of event channels available to dom0 (the control domain) from 1024 to 4096. This blog post is an attempt to shed some light on what these event channels are, and why they play a key role in VM density limits.

What is an event channel?

It's a channel for communications between a pair of VMs. An event channel is typically used by one VM to notify another VM about something. For example, a VM's paravirtualised disk driver would use an event channel to notify dom0 of the presence of newly written data in a region of memory shared with dom0.

Here are the various things that a VM requires an event channel for:

  • one per virtual disk;
  • one per virtual network interface;
  • one for communications with xenstore;
  • for HVM guests, one per virtual CPU (rising to two in XenServer 6.2); and
  • for PV guests; one to communicate with the console daemon.


Therefore VMs will typically require at least four dom0 event channels depending on the configuration of the VM. Requiring more than ten is not an uncommon configuration.

Why can event channels cause scalability problems when trying to run lots of VMs?

The total number of event channels any domain can use is part of a shared structure in the interface between a paravirtualised VM and the hypervisor; it is fixed at 1024 for 32-bit domains such as XenServer's dom0. Moreover, there are normally around 50--100 event channels used for other purposes, such as physical interrupts. This is normally related to the number of physical devices you have in your host. This overhead means that in practice there might be not too many more than 900--950 event channels available for VM use. So the number of available event channels becomes a limited resource that can cause you to experience a hard limit on the number of VMs you can run on a host.

To take an example: Before XenServer 6.2, if each of your VMs requires 6 dom0 event channels (e.g. an HVM guest with 3 virtual disks, 1 virtual network interface and 1 virtual CPU) then you'll probably find yourself running out of dom0 event channels if you go much over 150 VMs.

In XenServer 6.2, we have made a special case for our dom0 to allow it to behave differently to other 32-bit domains to allow it to use up to four times the normal number of event channels. Hence there are now a total of 4096 event channels available.

So, on XenServer 6.2 in the same scenario as the example above, even though each VM of this type would now use 7 dom0 event channels, the increased total number of dom0 event channels means you'd have to run over 570 of them before running out.

What happens when I run out of event channels?

On VM startup, the XenServer toolstack will try to plumb all the event channels through from dom0 to the nascent VM. If there are no spare slots, the connection will fail. The exact failure mode depends on which subsystem the event channel was intended for use in, but you may see error messages like these when the toolstack tries to connect up the next event channel after having run out:

error 28 mapping ring-refs and evtchn
message: xenopsd internal error: Device.Ioemu_failed("qemu-dm exited unexpectedly")

In other words, it's not pretty. The VM either won't boot or will run with reduced functionality.

That sounds scary. How can I tell whether there's sufficient spare event channels to start another VM?

XenServer has a utility called "lsevtchn" that allows you to inspect the event channel plumbing.

In dom0, run the following command to see what event channels are connected to a particular domain.

/usr/lib/xen/bin/lsevtchn 

For example, here is the output from a PV domain with domid 36:

[root@xs62 ~]# /usr/lib/xen/bin/lsevtchn 36
   1: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 51
   2: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 52
   3: VCPU 0: Virtual IRQ 0
   4: VCPU 0: IPI
   5: VCPU 0: IPI
   6: VCPU 0: Virtual IRQ 1
   7: VCPU 0: IPI
   8: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 55
   9: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 53
  10: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 54
  11: VCPU 0: Interdomain (Connected) - Remote Domain 0, Port 56

You can see that six of this VM's event channels are connected to dom0.

But the domain we are most interested in is dom0. The total number of event channels connected to dom0 can be determined by running

/usr/lib/xen/bin/lsevtchn 0 | wc -l

Before XenServer 6.2, if that number is close to 1024 then your host is on the verge of not being able to run an additional VM. On XenServer 6.2, the number to watch out for is 4096. However, before you'd be able to get enough VMs up and running to approach that limit, there are various other things you might run into depending on configuration and workload. Watch out for further blog posts describing how we have cleared more of these hurdles in XenServer 6.2.

Continue reading
66344 Hits
0 Comments

About XenServer

XenServer is the leading open source virtualization platform, powered by the Xen Project hypervisor and the XAPI toolstack. It is used in the world's largest clouds and enterprises.
 
Commercial support for XenServer is available from Citrix.