PCIe Passthrough (Direct virtual machine access to PCIe device)

PCIe Passthrough allows a guest virtual machine operating system to directly manage a physical device connected to the PCI Express bus, bypassing the hypervisors software layer. This delivers performance comparable to running on real hardware.

This technology is used in server environments to virtualize resource intensive applications that require direct access to GPUs, NVMe drives, or network cards. It is also in demand in workstation scenarios where a user needs to run, for example, a gaming or graphics oriented operating system inside a virtual machine without losing performance.

Typical problems

The main problem is the inability to share a device among multiple virtual machines without hardware support such as SR-IOV. Additionally, once a device is assigned to a guest system, the host system loses access to it. Errors may also arise due to incompatible interrupt implementations or the lack of a reset mechanism on the device.

How it works

At the hardware level, the processor and chipset support I/O domains, specifically DMA Remapping and Interrupt Remapping, as described in the Intel VT-d or AMD-Vi specifications. The hypervisor administrator uses the IOMMU to assign a PCIe device to a separate isolation group. When the virtual machine starts, the hypervisor does not emulate the device; instead, it creates a special address mapping table for the guest. The guest OS loads its own driver, which directly programs the device registers. All DMA requests from the device are intercepted by the IOMMU, which dynamically translates physical addresses from the guests address space into real host memory addresses while also checking access rights. Interrupts from the device are also intercepted and rerouted to the guest virtual machine without hypervisor involvement, except in very rare cases that require software emulator intervention. As a result, the device runs at near native speed, and memory isolation is enforced by hardware, preventing unauthorized access to host data or data belonging to other virtual machines.

PCIe Passthrough features

  1. IOMMU. A basic requirement for Passthrough is an enabled IOMMU in the BIOS and kernel. It provides hardware isolation of DMA requests from the guest system, protecting host memory from unauthorized access.
  2. I/O groups. The device must be in its own IOMMU group. A group includes all functions that share a common PCIe transaction. If the device shares a group with other devices, Passthrough is not possible.
  3. VFIO driver. The vfio-pci kernel module replaces the standard driver, disabling host management of the device. This allows userspace (QEMU) to directly control the PCI configuration space and handle interrupts.
  4. VFIO-PCI (Isolating PCI devices for virtualization)VFIO (Direct device Input-Output virtualization)
  5. Host preparation. Load the vfio, vfio_iommu_type1, and vfio_pci modules. Add the kernel parameter intel_iommu=on or amd_iommu=on. It is recommended to add the modules to initramfs.
  6. Device isolation. The lspci -v command reveals device IDs (vendor:device). Then execute echo vfio-pci > /sys/bus/pci/devices/0000:XX:00.0/driver_override. The device automatically binds to VFIO.
  7. QEMU configuration. In the QEMU command line, the argument -device vfio-pci,host=XX:XX.X assigns the device to the guest. Additional parameters include multifunction, x-vga (for GPUs), and romfile for boot ROM.
  8. GPU passthrough. For a graphics card, its audio function (HDMI Audio) within the same group must also be passed through. The guest driver (NVIDIA or AMD) works natively via the driver installed inside the VM.
  9. Device reset. Many GPUs do not support FLR correctly. This causes the host to freeze when restarting the VM. Solutions include using vendor-reset or assigning the GPU to a dedicated VM without reboots.
  10. Primary GPU issue. If the GPU is used by the host for display output, Passthrough will not work. You need to disable the kernel driver (nouveau, amdgpu) or have a separate GPU for the host (e.g., integrated graphics).
  11. Network cards. Passing a NIC via VFIO provides zero latency and full speed. Inside the VM, the card appears as a physical device. Live Migration is not supported without additional mechanisms (SR-IOV with VFs).
  12. SR-IOV. A single port card that supports virtual functions allows sharing the physical port. Each VM can be assigned a separate VF via VFIO, while the host retains control of the PF. Requires NIC support.
  13. VF (Hardware I/O virtualization mechanism)PF (Hardware virtualization of Input-Output devices)
  14. MSI interrupts. VFIO prefers MSI/MSI-X over classic INTx. MSI reduces interrupt handling overhead. When passing through a device, ensure the guest uses a driver with MSI support.
  15. Memory limitation. The host must reserve hugepages for guest memory isolation. Without hugepages, performance decreases, and IOMMU fragmentation may cause VM startup failures.
  16. GRUB configuration. Kernel parameters: iommu=pt (passive mode, minimal overhead), vfio_iommu_type1.allow_unsafe_interrupts=0. For older platforms: noiommu=off. After changes, run update-grub.
  17. Group verification. The lspci -vvt utility shows the PCI tree. The script find /sys/kernel/iommu_groups/ -type l lists all groups. If a device shares a group with a PCIe bridge and the bridge lacks ACS support, the device cannot be passed through.
  18. ACS. Modern platforms with ACS support break down groups correctly. For older systems, the patch pcie_acs_override=downstream,multifunction can be used, but this reduces hardware isolation.
  19. Guest OS. No special drivers are required inside the VM; standard drivers are used. Windows requires installation of the vendors driver. Linux sees the device as a normal PCIe device after loading pci-stub or vfio-pci.
  20. Security. Incorrect groups may allow a guest to DMA attack host memory. Always use iommu=strict and verify that the device is in an isolated group. For production, use only server platforms.
  21. Diagnostics. The kernel log (dmesg | grep -i vfio) shows binding errors. Common problems: the device is busy with a host driver, incorrect BDF number, missing vfio_pci module in initramfs. On success, the VM starts without errors.

Comparisons

  • PCIe Passthrough vs SR-IOV. Passthrough assigns an entire physical device to a single VM, providing maximum performance and isolation but precluding sharing. SR-IOV creates virtual functions, allowing one device to be shared among several VMs, reducing emulation overhead, but requires special hardware support from the device.
  • PCIe Passthrough vs VirtIO (emulation). Passthrough works via direct hardware access bypassing the hypervisor, delivering near native performance for GPU/NVMe. VirtIO uses paravirtualization with a coordinated driver, offering good performance for disks and networking but with noticeable latency and dependence on the hypervisor, winning in migration flexibility.
  • PCIe Passthrough vs Mediated Devices (mdev). Mdev, as in Intel GVT-g or NVIDIA vGPU, divides a single physical device more finely among several VMs with partial resources. Passthrough provides full control and isolation but loses sharing. Mdev is better for graphics VMs but requires a complex host driver and does not always support live migration.
  • vGPU (Splitting a GPU into virtual devices)
  • PCIe Passthrough vs Paravirtualized SCSI/NVMe. Paravirtualized protocols (e.g., VMware PVSCSI) emulate a controller with an optimized data path, preserving live migration and snapshots. Passthrough gives the VM a real NVMe drive, eliminating double buffering, but blocks migration and snapshots because the device state is tied to the physical hardware.
  • PCIe Passthrough vs Software-based I/O (QEMU/userspace). QEMU with userspace drivers (DPDK, SPDK) bypasses the hypervisor for fast operations without true passthrough. Passthrough is safer via IOMMU, eliminating emulation vulnerabilities, but requires isolated memory. Software approaches are more flexible in queue configuration and not tied to a specific PCIe device, but they incur data copying overhead.

OS and driver support

PCIe Passthrough requires hardware I/O virtualization (Intel VT-d or AMD-Vi), hypervisor support (KVM, Xen, VMware ESXi), and a guest OS that needs a real device driver rather than a paravirtualized one. The host system must exclude the device from its own driver (e.g., via vfio-pci on Linux), and the guest recognizes it as native through a standard I/O driver with direct BAR writes.

Security

Security is achieved by using the IOMMU to translate DMA addresses from guest physical memory to system memory, preventing DMA attacks where the guest attempts to read host memory or memory of other VMs. Additionally, interrupt remapping and assignment of separate protection contexts are used, but vulnerabilities remain if there are problems in the IOMMU or when using SR-IOV without proper device grouping.

Logging

PCIe Passthrough logging is implemented at several levels: the hypervisor logs device binding/unbinding events to the vfio-pci module, IOMMU translation errors (e.g., dmar: DRHD: handling fault status in the Linux kernel log), and interrupts are logged via the virtual machine context (QEMU log with -trace). Guest drivers perform normal logging of access errors, which are no longer intercepted by the host.

Limitations

The main limitations are: inability to perform live migration of a VM with a passed through device (the cards state cannot be serialized), the entire IOMMU group must be passed as a unit (ACS override may be necessary), the need for a dedicated device per VM (without SR-IOV hardware port virtualization), and dependence on correct device reset implementation (FLR or system reset) for re passthrough.

History and evolution of PCIe Passthrough

The technology began with Xen in 2006 (pci = [ ... ] option), was later implemented in KVM using the pci-assign module (now deprecated), which was replaced by vfio-pci in Linux kernel 3.6 (2012). Subsequent improvements included support for vfio-no-iommu (testing only), hotplug/hot-unplug of devices, partial emulation of PCIe capabilities, and from the 2020s onward, active development of PCIe ATS/PRI support for acceleration.