Virtualization Blog

Discussions and observations on virtualization.

PCI Pass-Through on XenServer 7.0

Plenty of people have asked me over the years how to pass-through generic PCI devices to virtual machines running on XenServer. Whilst it isn't officially supported by Citrix, it's none the less perfectly possible to do; just note that your mileage may vary, because clearly it's not rigorously tested with all the possible different types of device people might want to pass-through (from TV cards, to storage controllers, to USB hubs...!).

The process on XenServer 7.0 differs somewhat from previous releases, in that the Dom0 control domain is now CentOS 7.0-based, and UEFI boot (in addition to BIOS boot) is supported. Hence, I thought it would be worth writing up the latest instructions, for those who are feeling adventurous.

Of course, XenServer officially supports pass-through of GPUs to both Windows and Linux VMs, hence this territory isn't as uncharted as might first appear: pass-through in itself is fine. The wrinkles will be to do with a particular given piece of hardware.

A Short Introduction to PCI Pass-Through

Firstly, a little primer on what we're trying to do.

Your host will have a PCI bus, with multiple devices hosted on it, each with its own unique ID on the bus (more on that later; just remember this as "B:D.f"). In addition, each device has a globally unique vendor ID and device ID, which allows the operating system to look up what its human-readable name is in the PCI IDs database text file on the system. For example, vendor ID 10de corresponds to the NVIDIA Corporation, and device ID 11b4 corresponds to the Quadro K4200. Each device can then (optionally) have multiple sub-vendor and sub-device IDs, e.g. if an OEM has its own branded version of a supplier's component.

Normally, XenServer's control domain, Dom0, is given all PCI devices by the Xen hypervisor. Drivers in the Linux kernel running in Dom0 each bind to particular PCI device IDs, and thus make the hardware actually do something. XenServer then provides synthetic devices (emulated or para-virtualised) such as SCSI controllers and network cards to the virtual machines, passing the I/O through Dom0 and then out to the real hardware devices.

This is great, because it means the VMs never see the real hardware, and thus we can live migrate VMs around, or start them up on different physical machines, and the virtualised operating systems will be none the wiser.

If, however, we want to give a VM direct access to a piece of hardware, we need to do something different. The main reason one might want to is because the hardware in question isn't easy to virtualise, i.e. the hypervisor can't provide a synthetic device to a VM, and somehow then "share out" the real hardware between those synthetic devices. This is the case for everything from an SSL offload card to a GPU.

Aside: Virtual Functions

There are three ways of sharing out a PCI device between VMs. The first is what XenServer does for network cards and storage controllers, where a synthetic device is given to the VM, but then the I/O streams can effectively be mixed together on the real device (e.g. it doesn't matter that traffic from multiple VMs is streamed out of the same physical network card: that's what will end up happening at a physical switch anyway). That's fine if it's I/O you're dealing with.

The second is to use software to share out the device. Effectively you have some kind of "manager" of the hardware device that is responsible for sharing it between multiple virtual machines, as is done with NVIDIA GRID GPU virtualisation, where each VM still ends up with a real slice of GPU hardware, but controlled by a process in Dom0.

The third is to virtualise at the hardware device level, and have a PCI device expose multiple virtual functions (VFs). Each VF provides some subset of the functionality of the device, isolated from other VFs at the hardware level. Several VMs can then each be given their own VF (using exactly the same mechanism as passing through an entire PCI device). A couple of examples are certain Intel network cards, and AMD's MxGPU technology.

OK, So How Do I Pass-Through a Device?

Step 1

Firstly, we have to stop any driver in Dom0 claiming the device. In order to do that, we'll need to ascertain what the ID of the device we're interested in passing through is. We'll use B:D.f (Bus, Device, function) numbering to specify it.

Running lspci will tell you what's in your system:

davidcot@helical:~$ lspci
00:00.0 Host bridge: Intel Corporation 82X38/X48 Express DRAM Controller
00:01.0 PCI bridge: Intel Corporation 82X38/X48 Express Host-Primary PCI Express Bridge
00:06.0 PCI bridge: Intel Corporation 82X38/X48 Express Host-Secondary PCI Express Bridge
00:1a.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02)
00:1a.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 02)
00:1a.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 02)
00:1d.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: NVIDIA Corporation G86 [Quadro NVS 290] (rev a1)
04:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5754 Gigabit Ethernet PCI Express (rev 02)

Once you've found the device you're interested in, say 04:00.0 for my network card, we tell Dom0 to exclude it from being bound to by normal drivers. You can add to the Dom0 boot line as follows:

/opt/xensource/libexec/xen-cmdline --set-dom0 "xen-pciback.hide=(04:00.0)"

(What this does is edit /boot/grub/grub.cfg for you, or if you're booting using UEFI, /boot/efi/EFI/xenserver/grub.cfg instead!)

Step 2

Reboot! At the moment, a driver in Dom0 probably still has hold of your device, hence you need to reboot the host to get it relinquished.

Step 3

The easy bit: tell the toolstack to assign the PCI device to the VM. Run:

xe vm-list

And note the UUID of the VM you're interested in, then:

xe vm-param-set other-config:pci=0/0000:<B:D.f> uuid=<vm uuid>

Where, of course, <B.D.f> is the ID of the device you found in step 1 (like 04:00.0), and <vm uuid> corresponds to the VM you care about.

Step 4

Start your VM. At this point if you run lspci (or equivalent) within the VM, you should now see the device. However, that doesn't mean it will spring into life, because...

Step 5

Install a device driver for the piece of hardware you passed-through. The operating system within the VM may already ship with a suitable device driver, but it not, you'll need to go and get the appropriate one from the device manufacturer. This will normally be the standard Linux/Windows/other one that you would use for a physical system; the only difference occurs when you're using a virtual function, where the VF driver is likely to be a special one.

Health Warnings

As indicated above, pass-through has advantages and disadvantages. You'll get direct access to the hardware (and hence, for some functions, higher performance), but you'll forgo luxuries such as the ability to live migrate the virtual machine around (there's state now sitting on real hardware, versus virtual devices), and the ability to use high availability for that VM (because HA doesn't take into account how many free PCI devices of the right sort you have in your resource pool).

In addition, not all PCI devices take well to being passed through, and not all servers like doing so (e.g. if you're extending the PCI bus in a blade system to an expansion module, this can sometimes cause problems). Your mileage may therefore vary.

If you do get stuck, head over to the XenServer discussion forums and people will try to help out, but just note that Citrix doesn't officially support generic PCI pass-through, hence you're in the hands of the (very knowledgeable) community.

Conclusion

Hopefully this has helped clear up how pass-through is done on XenServer 7.0; do comment and let us know how you're using pass-through in your environment, so that we can learn what people want to do, and think about what to officially support on XenServer in the future!

XenServer High-Availability Alternative HA-Lizard
Better Together: PVS and XenServer!
 

Comments 16

Guest - Jeff Riechers on Thursday, 03 November 2016 11:27

I have done this lots of times. I use a Z800 server that has onboard Firewire. I ported it into a vm for hooking up a camera so my kids could do stop motion animation from a XenApp session. I also have ported the USB into a vm for a keyfob for licensing. Wish it was more publicly known all the cool stuff like this you can do.

0
I have done this lots of times. I use a Z800 server that has onboard Firewire. I ported it into a vm for hooking up a camera so my kids could do stop motion animation from a XenApp session. I also have ported the USB into a vm for a keyfob for licensing. Wish it was more publicly known all the cool stuff like this you can do.
Tobias Kreidl on Saturday, 05 November 2016 03:38

Yay, great to see this published in clear, concise steps! This is one for the XenServer forum to point to! :D

0
Yay, great to see this published in clear, concise steps! This is one for the XenServer forum to point to! :D
Guest - quadro5000 on Saturday, 05 November 2016 16:57

I need to passthrough a quadro 5000 gpu to one of my vms.
Typing lspci in the Xenserver console gives me back two device ID, one for the actual gpu and one for the audio device built in.
Do i need to use this procedure for each device ID?

0
I need to passthrough a quadro 5000 gpu to one of my vms. Typing lspci in the Xenserver console gives me back two device ID, one for the actual gpu and one for the audio device built in. Do i need to use this procedure for each device ID?
David Cottingham on Monday, 07 November 2016 10:18

If you want both the audio and GPU devices given to your VM, then yes, you need to use the procedure once for each device.

However, if you're not looking to have the audio device passed through, then you _should_ be able to just pass-through the GPU device.

0
If you want both the audio and GPU devices given to your VM, then yes, you need to use the procedure once for each device. However, if you're not looking to have the audio device passed through, then you _should_ be able to just pass-through the GPU device.
Guest - Alexey on Sunday, 06 November 2016 06:34

I think virtual functions for IO devices like Intel 10+Gbit cards should be supported by Citrix. I haven't measured how much performance gain it does, but it should definitely help offloading CPU. And it is quite simple to implement. The option should be enabled if every server in cluster have supported card with virtual functions and assign automatically available VFs to VMs

0
I think virtual functions for IO devices like Intel 10+Gbit cards should be supported by Citrix. I haven't measured how much performance gain it does, but it should definitely help offloading CPU. And it is quite simple to implement. The option should be enabled if every server in cluster have supported card with virtual functions and assign automatically available VFs to VMs
David Cottingham on Monday, 07 November 2016 10:06

Understood: it would be a performance gain for at least some use cases, as you're getting raw access to the NIC. The downside is then that you no longer benefit from migration, ACLs on the OVS, and so on.

In terms of implementation, it's more about the testing: each hardware vendor will need to test/certify that pass-through of VFs works with various guest OSs. Certainly not impossible, just a large test matrix.

But yes, supporting pass-through of SR-IOV VFs is something I'm interested in doing. The more people who can tell me what they'd use it for, the more likely it is that it will go up the priority list ;-).

0
Understood: it would be a performance gain for at least some use cases, as you're getting raw access to the NIC. The downside is then that you no longer benefit from migration, ACLs on the OVS, and so on. In terms of implementation, it's more about the testing: each hardware vendor will need to test/certify that pass-through of VFs works with various guest OSs. Certainly not impossible, just a large test matrix. But yes, supporting pass-through of SR-IOV VFs is something I'm interested in doing. The more people who can tell me what they'd use it for, the more likely it is that it will go up the priority list ;-).
Allan Smith on Tuesday, 08 November 2016 19:48

Great article. I've been looking for a way to pass-through multiple GRID K2 cores to one VM. Will give this a shot. Thanks!

0
Great article. I've been looking for a way to pass-through multiple GRID K2 cores to one VM. Will give this a shot. Thanks!
Stefan Bregenzer on Friday, 11 November 2016 22:53

Hi David,

great article and very interesting!!

As I'm very much interested in AMD's MxGPU, I'd like to know, if pci pass-through is the standard/only way how MxGPU works. Is it planned to integrate the setup of GPU-Passthrough in Xencenter in the future like it works for NVIDIA Grid or Intel Iris Pro? (create new vm --> GPU-Tab --> select GPU)

If this is the standard way, how can AMD provide high availability with this solution? Also the live migration feature would be nice.

Thanks for dealing with my questions!

Best regards,

Stefan

0
Hi David, great article and very interesting!! As I'm very much interested in AMD's MxGPU, I'd like to know, if pci pass-through is the standard/only way how MxGPU works. Is it planned to integrate the setup of GPU-Passthrough in Xencenter in the future like it works for NVIDIA Grid or Intel Iris Pro? (create new vm --> GPU-Tab --> select GPU) If this is the standard way, how can AMD provide high availability with this solution? Also the live migration feature would be nice. Thanks for dealing with my questions! Best regards, Stefan
David Cottingham on Monday, 14 November 2016 11:55

Hi Stefan,

Thanks!

AMD MxGPU uses SR-IOV VFs, hence yes, it does use pass-through. I'm not aware of whether AMD offer HA capabilities or not with this, but remember that HA is there to automatically restart VMs that are dead (due to a host failure) hence provided that a host has an appropriate GPU card in place, theoretically there's no technical problem with HA just starting up a new copy of the dead VM.

The only thing that would need to be done is to get the HA planner algorithm to point out to the user if they have enough GPUs in hosts in the pool to tolerate failures. Today, XenServer's HA planner doesn't have that capability.

Live migration is more difficult, because there's state in the GPU hardware that somehow needs to be migrated.

In terms of future plans, I can't comment on roadmap, but as you'd expect, we're working with AMD on GPU technologies in XenServer (as we already support pass-through of whole AMD GPUs today).

Hope this helps!

Cheers,

David.

0
Hi Stefan, Thanks! AMD MxGPU uses SR-IOV VFs, hence yes, it does use pass-through. I'm not aware of whether AMD offer HA capabilities or not with this, but remember that HA is there to automatically restart VMs that are dead (due to a host failure) hence provided that a host has an appropriate GPU card in place, theoretically there's no technical problem with HA just starting up a new copy of the dead VM. The only thing that would need to be done is to get the HA planner algorithm to point out to the user if they have enough GPUs in hosts in the pool to tolerate failures. Today, XenServer's HA planner doesn't have that capability. Live migration is more difficult, because there's state in the GPU hardware that somehow needs to be migrated. In terms of future plans, I can't comment on roadmap, but as you'd expect, we're working with AMD on GPU technologies in XenServer (as we already support pass-through of whole AMD GPUs today). Hope this helps! Cheers, David.
Stefan Bregenzer on Wednesday, 16 November 2016 21:24

Hi David,

thank you very much for your explanation! Now it is more clear for me.

I'm really looking forward how things develop in xenserver support of AMD MxGPU, because I personally think, that this technology mixed with interesting hardware costs (without continuous licencing fees) will push forward virtualization without suffering a GPU-lack. Intel Iris Pro is a bit weak in context of scalability, I think, and NVIDIA Grid is to expensive for small companies.

I hope, the integration will lead to a user-friendly way in Xencenter, which prevents faults and is solid for a production environment. Please provide updates of new developments in context of AMD GPU-Virtualization (you're the only one, I've found during my ongoing search of the internet, who gives that much insight what is possible at the moment!!!).

Best regards,

Stefan

0
Hi David, thank you very much for your explanation! Now it is more clear for me. I'm really looking forward how things develop in xenserver support of AMD MxGPU, because I personally think, that this technology mixed with interesting hardware costs (without continuous licencing fees) will push forward virtualization without suffering a GPU-lack. Intel Iris Pro is a bit weak in context of scalability, I think, and NVIDIA Grid is to expensive for small companies. I hope, the integration will lead to a user-friendly way in [b]Xencenter,[/b] which prevents faults and is solid for a production environment. Please provide updates of new developments in context of AMD GPU-Virtualization (you're the only one, I've found during my ongoing search of the internet, who gives that much insight what is possible at the moment!!!). Best regards, Stefan
David Cottingham on Wednesday, 16 November 2016 21:58

Thanks Stefan: will definitely let people know when we can share more :-).

0
Thanks Stefan: will definitely let people know when we can share more :-).
Guest - wiliamo on Tuesday, 15 November 2016 07:02

I have done all the steps but when i want to turn on my VM give me error: Xenctrl.Error("38:Function not implemented")

0
I have done all the steps but when i want to turn on my VM give me error: Xenctrl.Error("38:Function not implemented")
David Cottingham on Wednesday, 16 November 2016 22:00

What device are you trying to pass through?

0
What device are you trying to pass through?
Hernan on Thursday, 09 March 2017 14:21

I´ve modified the grub.cfg file to hide certain PCI device, and restarted Xen several times, but when I use the lspci command from the console, I still see the device, so I understand that can´t be assigned directly to a VM, am I right?
What else can I do to make it work?
Thanks

0
I´ve modified the grub.cfg file to hide certain PCI device, and restarted Xen several times, but when I use the lspci command from the console, I still see the device, so I understand that can´t be assigned directly to a VM, am I right? What else can I do to make it work? Thanks
Sowmya Panuganti on Thursday, 30 March 2017 11:29

In Xenserver 7 , I've have done the steps to pass through a pcie device to a VM. I observe that if the VM is Linux Ubuntu the device driver loads fine. However, when I have Windows running in the VM- the device driver couldnt load. The device driver is built in the Linux/Windows OS.
I did not have any issue in Xenserver 6.2. PCI pass through to Windows VM and windows device driver worked fine. I cannot doubt my hardware.
Wondering if it is the DOM0 CentOS7 Combination.
I would like to move to Xenserver 7 and would appreciate any pointers or workarounds.

0
In Xenserver 7 , I've have done the steps to pass through a pcie device to a VM. I observe that if the VM is Linux Ubuntu the device driver loads fine. However, when I have Windows running in the VM- the device driver couldnt load. The device driver is built in the Linux/Windows OS. I did not have any issue in Xenserver 6.2. PCI pass through to Windows VM and windows device driver worked fine. I cannot doubt my hardware. Wondering if it is the DOM0 CentOS7 Combination. I would like to move to Xenserver 7 and would appreciate any pointers or workarounds.
Sowmya Panuganti on Wednesday, 05 April 2017 03:58

Hi, i I have tried the steps and at Step 5, my device drivers in VMfailed to load on the device.
The same device and driver worked fine on Xenserver 6.2 and 6.5 . But driver loading fails on Xen7.
I am seeing a problem only when the VM is Windows. In Xen 7 , I could pass through to a linux VM and load the in box driver fine., but Windows did not work
What could have changed in Xen 7 related to this.
Any insight would be appreciated. Thanks!

0
Hi, i I have tried the steps and at Step 5, my device drivers in VMfailed to load on the device. The same device and driver worked fine on Xenserver 6.2 and 6.5 . But driver loading fails on Xen7. I am seeing a problem only when the VM is Windows. In Xen 7 , I could pass through to a linux VM and load the in box driver fine., but Windows did not work What could have changed in Xen 7 related to this. Any insight would be appreciated. Thanks!

About XenServer

XenServer is the leading open source virtualization platform, powered by the Xen Project hypervisor and the XAPI toolstack. It is used in the world's largest clouds and enterprises.
 
Commercial support for XenServer is available from Citrix.