While overcoming the architectural obstacles in order to allow XenServer 6.2.0 to run up to 500 HVM guests per host, we came across an interesting observation: even for apparently idle Windows 7 VMs, there is some amount of work done in dom0. With careful analysis and optimisations in the way we emulate virtual hardware for such VMs, we managed to eliminate most of this work for guests that are idle. This allows us to effectively run 500 Windows VMs and keep dom0 load average practically at zero (after the VMs have finished booting).
The busy bees
We started by observing the CPU utilisation of the QEMU process that is responsible for the hardware emulation of a Windows 7 guest. Even when the guest was just waiting for login and not running disk indexing services, Windows Update or other known I/O operations, the QEMU process would consistently consume between 1% and 3% of CPU time in dom0 (depending mostly on the host's characteristics). In this scenario, there is one QEMU process for each HVM guest running on a host.
When we succeeded at overcoming the hard limits that prevented us from starting 500 guests (see this post), we soon realised that the small amount of work done by each QEMU would quickly add up. With 3% of dom0 vCPU consumption per running Windows 7 VM, it takes just over 130 VMs to completely max out four dom0 vCPUs. At that point, dom0's load average quickly increases and the host's performance becomes compromised.
Without available CPU time, dom0 is unable to promptly serve network and storage I/O requests. Other control plane operations are also affected since the toolstack (e.g. xapi, xenopsd, xenstore) will struggle to run and respond to requests from XenCenter or other clients.
In order to investigate why those QEMUs were "spinning" so much, we created Project Pulsar (named after neutron stars that spin considerably). The project conducted a detailed analysis of every event that disturbed QEMU from its otherwise sleeping state.
With careful debugging, we learned that there are two categories of events that cause QEMU to wake up. Firstly, there are internal timers that occur several times a second to poll for certain events (e.g. buffered I/O). Secondly, there are actual interrupts generated by guests to check on certain virtual hardware (e.g. USB and the parallel port).
The following table shows the amount of events disturbing QEMU during the lifetime of a Windows 7 VM running for 5 minutes. The events are grouped in 30 seconds time intervals for simplicity. This facilitates the visualisation of the boot period (within the first 30 seconds interval), the time that the VM is allegedly idle and the shutdown phase at the end. We separated the columns into Timer, Read and Write Events. Timer events are internal to QEMU and Read or Write events come from the VM as interrupts. After the PV drivers are loaded (which happens during the boot), storage and network I/O are not handled by QEMU and therefore are not accounted anymore.
|30 Secs Interval||Timer Events||Read Events||Write Events|
While certain events only happen during the guest initialisation or shortly after the boot completed (e.g. parallel port scans), others remain constant throughout the life of the VM (e.g. USB protocols).
Idling down dom0
To address the first category of events that disturb QEMU (e.g. internal timers), we first studied why they were ever required. As it turned out, newer versions of QEMU were already patched to disable some of these. A good example is the case of buffered I/O which required polling. With the creation of a dedicated event channel for buffered I/O interrupts (see this patch) and a corresponding change to the device model (see this patch), QEMU no longer needs to poll.
The other timers we identified are necessary only when certain features are enabled. These are the QEMU monitor (that allows debugging tools to be hooked up to QEMU processes), serial port emulation and VNC connections. Considering XenServer does not support QEMU debugging via its monitor feature nor direct serial connections to HVM guests, these could be safely disabled. The last timer would only be active when a VNC connection is established to a guest (e.g. via XenCenter).
The second category of events happens due to the very nature of the hardware that is being emulated. If we present a Windows VM with an IDE DVD-ROM drive, the guest OS handles this (virtual) drive as a real drive. For example, it will poll the drive every so often to check whether the media has changed. Similar types of interrupts will be initiated by the guest to communicate with other emulated hardware.
In order to address these, we modified Xapi to allow the emulation of USB, parallel and serial ports to be turned off. Parallel port emulation fits in the same category as the serial port (i.e. there is no supported way to plug virtual devices to these ports in a guest) and the emulation of both are fairly safe and symptomless to be turned off.
Disabling USB, however, may have side effects. When using VNC to connect to a guest's console, a USB tablet device driver is used to allow for absolute coordinates of the mouse on the screen. When not using this USB driver, the VNC falls back to a PS/2 emulation which can only provide relative mouse positioning. The side effect is that, without USB, the mouse pointer of the VNC client will very likely be misaligned with the mouse pointer in the guest. This makes the console very hard to use.
The good news is that the Windows Remote Desktop Protocol (RDP) does not rely on the USB tablet driver. If the guest is configured to allow RDP connections, the USB emulation can be disabled without this side effect. When available, XenCenter already prefers RDP over VNC connections to Windows VMs by default.
Configuring the toolstack
The recommendations of Project Pulsar were adopted as defaults wherever possible and have been incorporated in XenServer 6.2.0. These included changes not visible to the VM (such as QEMU internal timers). However, we decided not to change the virtual hardware presented to VMs unless this is explicitly configured.
In order to configure Xapi to disable the Serial Port emulation, use the following command:
xe vm-param-set uuid=<vm-uuid> platform:hvm_serial=none
Similarly, the Parallel Port emulation can be disabled as follows:
xe vm-param-set uuid=<vm-uuid> platform:parallel=none
Finally, the USB emulation can be disabled as follows:
xe vm-param-set uuid=<vm-uuid> platform:usb=false
xe vm-param-set uuid=<vm-uuid> platform:usb_tablet=false
Note that two commands are necessary to completely disable the USB emulation. These disable both the virtual USB hub and the virtual tablet device used for positioning the mouse.
The best way to disable the emulation of the DVD-ROM drive is to delete the associated VBD. For information on how to do that, refer to Section A.4.23.3 of the XenServer 6.2.0 Administrator's Guide.
The following table shows the amount of events disturbing QEMU during the lifetime of a Windows 7 VM running for 5 minutes. This is the same VM used for the numbers in the table above, except that this time we incorporated the timer patches in QEMU and disabled the USB, DVD-ROM, Monitor, Parallel- and Serial- port emulation.
|30 Secs Interval||Timer Events||Read Events||Write Events|
With these figures, we are able to start 500 Windows 7 VMs on one host and keep dom0 load average practically at zero (after the VMs have booted).