The configuration of a XenApp virtual machine (VM) hosted on XenServer that supports two concurrent graphics processing engines in passthrough mode is shown to work reliably and provide the opportunity to give more flexibility to a single XenApp VM rather than having to spread the access to the engines over two separate XenApp VMs. This in turn can provide more flexibility, save operating system licensing costs and ostensibly, could be extended to incorporate additional GPU engines.
A XenApp virtual machine (VM) that supports two or more concurrent graphics processing units (GPUs) has a number of advantages over running separate VM instances, each with its own GPU engine. For one, if users happen to be unevenly relegated to particular XenApp instances, some XenApp VMs may idle while other instances are overloaded, to the detriment of users associated with busy instances. It is also simpler to add capacity to such a VM as opposed to building and licensing yet another Windows Server VM. This study made use of an NVIDIA GRID K2 (driver release 340.66), comprised of two Kepler GK104 engines and 8 GB of GDDR5 RAM (4 GB per GPU). It is hosted in a base system that consists of a Dell R720 with dual Intel Xeon E5-2680 v2 CPUs (40 VCPUs, total, hyperthreaded) hosting XenServer 6.2 SP1 running XenApp 7.6 as a VM with 16 VCPUs and 16 GB of memory on Windows 2012 R2 Datacenter.
It is important to note that these steps constitute changes that are not officially supported by Citrix or NVIDIA and are to be regarded as purely experimental at this stage.
Registry changes to XenApp were made according to these instructions provided in the Citrix Product Documentation.
On the XenServer, first list devices and look for GRID instances:
# lspci|grep -i nvid
44:00.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K2] (rev a1)
45:00.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K2] (rev a1)
Next, get the UUID of the VM:
# xe vm-list
uuid ( RO) : 0c8a22cf-461f-0030-44df-2e56e9ac00a4
name-label ( RW): TST-Win7-vmtst1
power-state ( RO): running
uuid ( RO) : 934c889e-ebe9-b85f-175c-9aab0628667c
name-label ( RW): DEV-xapp
power-state ( RO): running
Get the address of the existing GPU engine, if one is currently associated:
# xe vm-param-get param-name=other-config uuid=934c889e-ebe9-b85f-175c-9aab0628667c
vgpu_pci: 0/0000:44:00.0; pci: 0/0000:44:0.0; mac_seed: d229f84d-73cc-e5a5-d105-f5a3e87b82b7; install-methods: cdrom; base_template_name: Windows Server 2012 (64-bit)
(Note: ignore any vgpu_pci parameters that are irrelevant now to this process, but may be left over from earlier procedures and experiments.)
Dissociate the GPU via XenCenter or via the CLI, set GPU type to “none”.
Then, add both GPU engines following the recommendations in assigning multiple GPUs to a VM in XenServer using the other-config:pci parameter:
# xe vm-param-set uuid=934c889e-ebe9-b85f-175c-9aab0628667c
In other words, do not use the vgpu_pci parameter at all.
Check if the new parameters took hold:
# xe vm-param-get param-name=other-config uuid=934c889e-ebe9-b85f-175c-9aab0628667c params=all
vgpu_pci: 0/0000:44:00.0; pci: 0/0000:44:0.0,0/0000:45:0.0; mac_seed: d229f84d-73cc-e5a5-d105-f5a3e87b82b7; install-methods: cdrom; base_template_name: Windows Server 2012 (64-bit)
Next, turn GPU passthrough back on for the VM in XenCenter or via the CLI and start up the VM.
On the XenServer you should now see no GPUs available:
Failed to initialize NVML: Unknown Error
This is good, as both K2 engines now have been allocated to the XenApp server.
On the XenServer you can also run “xn –v pci-list 934c889e-ebe9-b85f-175c-9aab0628667c” (the UUID of the VM) and should see the same two PCI devices allocated:
# xn -v pci-list 934c889e-ebe9-b85f-175c-9aab0628667c
id pos bdf
0000:44:00.0 2 0000:44:00.0
0000:45:00.0 1 0000:45:00.0
More information can be gleaned from the “xn diagnostics” command.
Next, log onto the XenApp VM and check settings using nvidia-smi.exe. The output will resemble that of the image in Figure 1.
Figure 1. Output From the nvidia-smi utility, showing the allocation of both K2 engines.
Note the output shows correctly that 4096 MiB of memory are allocated for each of the two engines in the K2, totaling its full capacity of 8196 MiB. XenCenter will still show only one GPU engine allocated (see Figure 2) since it is not aware that both are allocated to the XenApp VM and has currently no way of making that distinction.
Figure 2. XenCenter GPU allocation (showing just one engine – all XenServer is currently capable of displaying)
So, how can you tell if it is really using both GRID engines? If you run the nvidia-smi.exe program on the XenApp VM itself, you will see it has two GPUs configured in passthrough mode (see the earlier screenshot in Figure 1). Depending on how apps are launched, you will see one or the other or both of them active. As a test, we ran two concurrent Unigine "Heaven" benchmark instances and both came out with the same metrics within 1% of each other as well as when just one instance was run, and both engines showed as being active. Displayed in Figure 3 is a sample screenshot of the Unigine ”Heaven” benchmark running with one active instance; note that it sees both K2 engines present, even though the process is making use of just one.
Figure 3. A sample Unigine “Heaven” benchmark frame. Note the two sets of K2 engine metrics displayed in the upper right corner.
It is evident from the display in the upper right hand corner that one engine has allocated memory and is working, as evidenced by the correspondingly higher temperature reading and memory frequency. The result of a benchmark using openGL and a 1024x768 pixel resolution is seen in Figure 4. Note again the difference between what is shown for the two engines, in particular the memory and temperature parameters.
Figure 4. Outcome of the benchmark. Note the higher memory and temperature on the second K2 engine.
When another instance is running concurrently, you see its memory and temperature also rise accordingly in addition to the load evident on the first engine, as well as activity on both engines in the output from the nvidia-smi.exe utility (Figure 5).
Figure 5. Two simultaneous benchmarks running, using both GRID K2 engines, and the nvidia-smi output.
You can also see with two instances running concurrently how the load is affected. Note in the performance graphs from XenCenter shown in Figure 6 how one copy of the “Heaven” benchmark impacts the server and then about halfway across the graphs, a second instance is launched.
Figure 6. XenCenter performance metrics of first one, then a second concurrent Unigine “Heaven” benchmark.
The combination of two GRID K2 engines associated with a single, hefty XenApp VM works well for providing adequate capacity to support a number of concurrent users in GPU passthrough mode without the need of hosting additional XenApp instances. As there is a fair amount of leeway in the allocation of CPUs and memory to a virtualized instance under XenServer (up to 16 vCPUs and 128 GB of memory under XenServer 6.2 when these tests were run), one XenApp VM should be able to handle a reasonably large number of tasks. As many as six concurrent sessions of this high-demand benchmark with 800x600 high-resolution settings have been tested with the GPUs still not saturating. A more typical application, like Google Earth, consumes around 3 to 5% of the cycles of a GRID K2 engine per instance during active use, depending on the activity and size of the window, so fairly minimal. In other words, twenty or more sessions could be handled by each engine, or potentially 40 or more for the entire GRID K2 with a single XenApp VM, provided of course that the XenApp’s memory and its own CPU resources are not overly taxed.
XenServer 6.2 already supports as many as eight physical GPUs per host, so as servers expand, one could envision having even more available engines that could be associated with a particular VM. Under some circumstances, passthrough mode affords more flexibility and makes better use of resources compared to creating specific vGPU assignments. Windows Server 2012 R2 Datacenter supports up to 64 sockets and 4 TB of memory, and hence should be able to support a significantly larger number of associated GPUs. XenServer 6.2 SP1 has a processor limit of 16 VCPUs and 128 GB of virtual memory. XenServer 6.5, officially released in January 2015, supports up to four K2 GRID cards in some physical servers and up to 192 GB of RAM per VM for some guest operating systems as does the newer release documented in the XenServer 6.5 SP1 User's Guide, so there is a lot of potential processing capacity available. Hence, a very large XenApp VM could be created that delivers a lot of raw power with substantial Microsoft server licensing savings. The performance meter shown above clearly indicates that VCPUs are the primary limiting factor in the XenApp configuration and with just two concurrent “Heaven” sessions running, about a fourth of the available CPU capacity is consumed compared to less than 3 GB of RAM, which is only a small additional amount of memory above that allocated by the first session.
These same tests were run after upgrading to XenServer 6.5 and with newer versions of the NVIDIA GRID drivers and continue to work as before. At various times, this configuration was run for many weeks on end with no stability issues or errors detected during the entire time.
I would like to thank my co-worker at NAU, Timothy Cochran, for assistance with the configuration of the Windows VMs used in this study. I am also indebted to Rachel Berry, Product Manager of HDX Graphics at Citrix and her team, as well as Thomas Poppelgaard and also Jason Southern of the NVIDIA Corporation for a number of stimulating discussions. Finally, I would like to greatly thank Will Wade of NVIDIA for making available the GRID K2 used in this study.