Virtualization Blog

Discussions and observations on virtualization.

XenServer's LUN scalability

"How many VMs can coexist within a single LUN?"

An important consideration when planning a deployment of VMs on XenServer is around the sizing of your storage repositories (SRs). The question above is one I often hear. Is the performance acceptable if you have more than a handful of VMs in a single SR? And will some VMs perform well while others suffer?

In the past, XenServer's SRs didn't always scale too well, so it was not always advisable to cram too many VMs into a single LUN. But all that changed in XenServer 6.2, allowing excellent scalability up to very large numbers of VMs. And the subsequent 6.5 release made things even better.

The following graph shows the total throughput enjoyed by varying numbers of VMs doing I/O to their VDIs in parallel, where all VDIs are in a single SR.

3541.png

In XenServer 6.1 (blue line), a single VM would experience modest 240 MB/s. But, counter-intuitively, adding more VMs to the same SR would cause the total to fall, reaching a low point around 20 VMs achieving a total of only 30 MB/s – an average of only 1.5 MB/s each!

On the other hand, in XenServer 6.5 (red line), a single VM achieves 600 MB/s, and it only requires three or four VMs to max out the LUN's capabilities at 820 MB/s. Crucially, adding further VMs no longer causes the total throughput to fall, but remains constant at the maximum rate.

And how well distributed was the available throughput? Even with 100 VMs, the available throughput was spread very evenly -- on XenServer 6.5 with 100 VMs in a LUN, the highest average throughput achieved by a single VM was only 2% greater than the lowest. The following graph shows how consistently the available throughput is distributed amongst the VMs in each case:

4016.png

Specifics

  • Host: Dell R720 (2 x Xeon E5-2620 v2 @ 2.1 GHz, 64 GB RAM)
  • SR: Hardware HBA using FibreChannel to a single LUN on a Pure Storage 420 SAN
  • VMs: Debian 6.0 32-bit
  • I/O pattern in each VM: 4 MB sequential reads (O_DIRECT, queue-depth 1, single thread). The graph above has a similar shape for smaller block sizes and for writes.
Preview of XenServer Administrators Handbook
When Virtualised Storage is Faster than Bare Metal

Related Posts

 

Comments 6

Tobias Kreidl on Friday, 26 June 2015 19:52

Very nice, Jonathan, and it is always good to raise discussions about standards that are known to change over time. This is particularly important when planning for projects involving large numbers of VMs, such as for XenDesktop configurations. One thing to obviously keep in mind is that at some point, the LUN itself will run out of the ability to properly handle the increased I/O load, and then of course, it's a good time to create a separate LUN using totally different disks to handle the expansion. It is also important that different types of storage (iSCSI, NFS, etc.) and the underlying disks (SAS vs. SATA, and spinning disk vs. SSD, etc.) and of course the configuration (RAID type, number of disks in the RAID configuration, any cache or SDS options, etc.) will all ultimately dictate what the limits might be for an individual configuration. Running tests to evaluate one's specific configurations will always be a good idea and naturally, the XenServer dom0 settings also might need some changes at some point.

0
Very nice, Jonathan, and it is always good to raise discussions about standards that are known to change over time. This is particularly important when planning for projects involving large numbers of VMs, such as for XenDesktop configurations. One thing to obviously keep in mind is that at some point, the LUN itself will run out of the ability to properly handle the increased I/O load, and then of course, it's a good time to create a separate LUN using totally different disks to handle the expansion. It is also important that different types of storage (iSCSI, NFS, etc.) and the underlying disks (SAS vs. SATA, and spinning disk vs. SSD, etc.) and of course the configuration (RAID type, number of disks in the RAID configuration, any cache or SDS options, etc.) will all ultimately dictate what the limits might be for an individual configuration. Running tests to evaluate one's specific configurations will always be a good idea and naturally, the XenServer dom0 settings also might need some changes at some point.
Guest - John on Saturday, 27 June 2015 01:15

Wouldn't this also depend on what the SR backend is comprised of? E.g. lvm, nfs, iscsi or lvmohba? From what I've heard, with lvmohba more LUNs is better because you're adding more scsi queues.

1
Wouldn't this also depend on what the SR backend is comprised of? E.g. lvm, nfs, iscsi or lvmohba? From what I've heard, with lvmohba more LUNs is better because you're adding more scsi queues.
Tobias Kreidl on Saturday, 27 June 2015 04:27

Indeed, depending on the specific characteristics of each storage array there will be some maximum queue depth per connection (port). The number of I/O requests per port that can be handled will greatly affect performance. Storage ports will have vastly different queue depths, anywhere from typically a few hundred queues per port to a several thousand. The number of initiators a single port can support depends on the number of available queues. A typical LUN queue depth is 32 to 64, so you can either balance this with a small number of initiators and many LUNs or a large number of LUNs and a small number of initiators (such as an HBA or iSCSI connection). No matter what the combination is, exceeding the queue depth will rapidly result in degraded performance.
In this particular case, perhaps this limit was not exceeded, hence there was no evidence of degraded performance?

0
Indeed, depending on the specific characteristics of each storage array there will be some maximum queue depth per connection (port). The number of I/O requests per port that can be handled will greatly affect performance. Storage ports will have vastly different queue depths, anywhere from typically a few hundred queues per port to a several thousand. The number of initiators a single port can support depends on the number of available queues. A typical LUN queue depth is 32 to 64, so you can either balance this with a small number of initiators and many LUNs or a large number of LUNs and a small number of initiators (such as an HBA or iSCSI connection). No matter what the combination is, exceeding the queue depth will rapidly result in degraded performance. In this particular case, perhaps this limit was not exceeded, hence there was no evidence of degraded performance?
Jonathan Davies on Monday, 29 June 2015 08:53

Thanks for your comments, Tobias and John. You're absolutely right -- the LUN's capabilities are an important consideration. And now that XenServer doesn't get in the way and impose an additional scalability limitation, it's the only remaining consideration.

0
Thanks for your comments, Tobias and John. You're absolutely right -- the LUN's capabilities are an important consideration. And now that XenServer doesn't get in the way and impose an additional scalability limitation, it's the only remaining consideration.
Martin Zugec on Tuesday, 30 June 2015 17:05

Hi Jonathan,

Great blog, I'm really glad to see those numbers published. VMs per LUN question has been floating around for a while, now we have a great link for referral.

Martin

0
Hi Jonathan, Great blog, I'm really glad to see those numbers published. VMs per LUN question has been floating around for a while, now we have a great link for referral. Martin
Guest - DVillar on Tuesday, 31 May 2016 21:58

What utility can I use to see VM throughput in my own environment like the graph you have above? I am running a XenServer 6.5 pool with 10 hosts and approximately 120 Windows 2008 R2 VMs. All of the VMs live in one SR, but am curious is having more than one SR in the pool would provide a performance benefit.

0
What utility can I use to see VM throughput in my own environment like the graph you have above? I am running a XenServer 6.5 pool with 10 hosts and approximately 120 Windows 2008 R2 VMs. All of the VMs live in one SR, but am curious is having more than one SR in the pool would provide a performance benefit.

About XenServer

XenServer is the leading open source virtualization platform, powered by the Xen Project hypervisor and the XAPI toolstack. It is used in the world's largest clouds and enterprises.
 
Commercial support for XenServer is available from Citrix.