VFIO-PCI (Isolating PCI devices for virtualization)

VFIO-PCI allows you to safely pass control of a physical device (such as a graphics card) to a guest operating system, bypassing the host driver, delivering near-native performance through direct memory access (DMA) and interrupts.

This technology is widely used in workstation virtualization scenarios (passing a GPU to a Windows VM for gaming or rendering), creating high-performance virtual network cards and NVMe controllers, and in server environments for isolating hardware accelerators (e.g., FPGAs or GPUs for machine learning) among different virtual machines.

Typical problems

The main challenge is correctly configuring IOMMU groups, because devices within the same group cannot be split between the host and the guest. Device reset bugs often occur when a graphics card fails to reinitialize for the host after the VM shuts down. Conflicts with the host driver are also possible, requiring the host driver to be blacklisted via kernel module parameters (e.g., vfio-pci.ids).

How VFIO-PCI works

VFIO-PCI is based on IOMMU (Input-Output Memory Management Unit) mechanisms and interrupt virtualization. First, the user unbinds the standard host driver from the target PCI device and binds the device to the vfio-pci driver. Then the hypervisor (QEMU) gains control of the device through the /dev/vfio/vfio file descriptor and the appropriate IOMMU group descriptor. Unlike traditional emulation (e.g., -device e1000 for a network card), where all commands are intercepted and translated by QEMU, VFIO enables direct passthrough. The IOMMU plays a key role: it translates DMA physical addresses initiated by the device into addresses accessible to the guest system, and also prevents the device from accessing host memory, enhancing isolation. Compared to the older pci-assign technology (used in KVM previously), VFIO offers safe interrupt handling (MSI/MSI-X) without data races, support for security levels (SVM), and strict access control via IOMMU groups, making it the preferred standard in the modern Linux kernel.

VFIO-PCI features

  1. Binding the vfio-pci driver. The vfio-pci kernel module implements the VFIO interface for PCI devices. It replaces the device’s standard driver, taking control of configuration registers and I/O space. This allows the device to be safely passed to a virtual machine.
  2. IOMMU groups. PCI devices are grouped based on shared access to DMA transactions. A group is the smallest isolation unit. Only an entire group can be passed through, otherwise host security is compromised. The ls-iommu tool displays this hierarchy.
  3. Enabling IOMMU. This feature requires IOMMU activation in both the BIOS and the bootloader. For Intel systems, use intel_iommu=on; for AMD, use amd_iommu=on. Additionally, iommu=pt can be set for performance optimization and to disable unnecessary remapping.
  4. Isolating the device from the host driver. Before binding, the standard driver must be unbound, for example using pci-stub or vfio-pci itself. This is done via driver_override. A command like echo "vfio-pci" > /sys/bus/pci/devices/.../driver_override replaces the driver at the sysfs level.
  5. Standard binding method using IDs. An alternative approach is to pass parameters to the vfio-pci module. In modprobe.d, add options vfio-pci ids=10de:13c2, .... When the module loads, it intercepts the specified devices before their native drivers initialize.
  6. Passthrough of devices with reset support. Some graphics cards and NVMe drives support the Function Level Reset (FLR) feature. This is critical for proper guest reboot. A device without FLR may hang at the PCI level after the VM shuts down, requiring a host reboot.
  7. Host framebuffer issue for GPUs. If a graphics card is used by the host as the primary display, it cannot be bound. You need to disable efifb, simplefb, or pass video=vesafb:off. Otherwise, the kernel retains access to the framebuffer registers.
  8. Direct Memory Access (DMA) and IOMMU. vfio-pci enables the IOMMU to translate DMA addresses from the guest to the host. This prevents attacks via the PCI bus. The guest sees only the memory pages assigned to it; any attempt to access outside that range triggers a page fault.
  9. MSI-X interrupt handling. vfio-pci redirects MSI-X messages through emulation in QEMU. The host intercepts the signal and sends it to the guest via an event file descriptor. MSI and legacy INTx with PIC controller virtualization are also supported.
  10. Configuration space emulation. Full access to configuration registers (256 bytes for PCI, 4096 bytes for PCIe) is emulated by the kernel. BAR registers, the command register, device status, and capability pointers are controlled by vfio-pci to prevent conflicts with the host.
  11. VFIO over VF (SR-IOV) method. For devices with Virtual Functions (VFs), vfio-pci works similarly as for physical devices. Each VF is isolated in its own IOMMU group. This allows passing through up to hundreds of virtual network cards or NVMe controllers to different VMs.
  12. SR-IOV (Hardware-level input-output device virtualization)VF (Hardware I/O virtualization mechanism)
  13. Resetting via sysfs. When the guest stops, vfio-pci initiates a device reset through sysfs (/sys/bus/pci/devices/.../reset). Success depends on support for PCIe FLR, PM reset, or secondary bus reset. The result appears in the kernel ring buffer.
  14. Early binding via initramfs. To intercept devices early (e.g., a system graphics card), vfio-pci is loaded before all other drivers in initramfs. This is configured via mkinitcpio or dracut: add the vfio-pci module first in the module list.
  15. Checking binding status. Information about the driver owner is in /sys/bus/pci/devices/.../driver. If there is a symlink to vfio-pci, the device is ready for passthrough. Additionally, the iommu_group file shows the group number, and iommu_spec shows the translation type.
  16. Mapping BAR buffer in QEMU. When starting a VM with -device vfio-pci,host=..., QEMU opens the device file /dev/vfio/XX. Then, via ioctl, BAR regions are mapped into the guest address space. This is done with minimal overhead using the mmap mechanism.
  17. Passing ROM options. vfio-pci can load an alternative device firmware. In QEMU, use romfile=/path/to/option.rom. The original device ROM BIOS is blocked by default because it might execute dangerous code at the host level.
  18. Hot unplugging and plugging. Dynamic detachment and attachment of devices to vfio-pci is supported while the host is running. Using driver_override and forced driver removal (e.g., echo 1 > /sys/bus/pci/devices/.../remove) allows binding without reboot.
  19. Limitations for grouped devices. If an IOMMU group contains a device used by the host (e.g., a USB controller), passing through the entire group is impossible. Solutions include disabling the device in BIOS or using kernel patches to split groups (ACPI tables).
  20. Monitoring via vfio tracepoints. The kernel provides tracepoints for latency analysis: vfio_pci_bar_map, vfio_pci_emulated_read. Using trace-cmd, you can debug unsupported configuration operations or stuck DMA transactions.
  21. Error logging in the ring buffer. On failures, vfio-pci writes error codes to dmesg. For example, vfio-pci: Failed to set up DMA indicates IOMMU issues. vfio-pci: Device does not support reset indicates absence of FLR. These logs are critical for diagnostics.

Comparison of VFIO-PCI with similar features

  • VFIO-PCI vs VFIO-MDEV. VFIO-PCI provides direct assignment of an entire physical device to a virtual machine, whereas VFIO-MDEV allows splitting a single physical device (e.g., a GPU) into multiple virtual instances. MDEV is preferable for media servers but requires specific driver support, while VFIO-PCI is more universal for assigning discrete devices.
  • VFIO-mdev (Splitting a device into virtual fragments)
  • VFIO-PCI vs QEMU emulated devices. QEMU emulated devices are entirely software-based, do not use hardware I/O virtualization, and provide low performance. VFIO-PCI leverages IOMMU for direct guest access to hardware, achieving native speeds but losing features like live migration and snapshots, which are possible with emulation.
  • VFIO-PCI vs legacy PCI passthrough. Traditional PCI passthrough via KVM without VFIO relied on an outdated mechanism with shared memory and limited isolation. VFIO-PCI provides strict isolation via IOMMU groups, safe interrupt handling (MSI-X), a unified interface through /dev/vfio, and support for VFs for SR-IOV.
  • VFIO-PCI vs UIO (Userspace I/O). UIO provides simple access to device memory from userspace but ignores IOMMU, lacks DMA safety, and does not handle interrupts well in multithreaded environments. VFIO-PCI, in contrast, manages DMA remapping, ensures device isolation, and supports interrupt remapping, which is critical for production virtualization environments.
  • VFIO-PCI vs vhost-user-blk. vhost-user-blk is optimized for block devices (virtio-blk) with zero-copy data via shared memory, but does not work with arbitrary PCIe devices. VFIO-PCI is universal — suitable for GPUs, NVMe drives, and network cards — but has overhead for DMA registration and requires dedicated device assignment that cannot be shared among VMs.

OS and Driver support

VFIO-PCI is implemented in the Linux kernel starting with version 3.9. It requires activation of the vfio-pci and vfio_iommu_type1 modules along with IOMMU support in the chipset (Intel VT-d / AMD-Vi). For guest systems, standard graphics card, NVMe, or USB controller drivers are sufficient because the device is passed through directly, while the host replaces the native driver with vfio-pci, which intercepts the configuration space and interrupts via eventfd.

Security

Security is achieved through hardware IOMMU isolation, which translates DMA requests from the device into the address space of only the assigned virtual machine, preventing access to host or other guest memory. VFIO-PCI additionally checks all mmap operations and interrupt mappings via the /dev/vfio file descriptor, and also blocks direct DMA from uncontrolled devices. However, attacks such as a device with a fake PCI request are theoretically possible without proper ACS configuration.

Logging

Logging in VFIO-PCI is done at several levels. The kernel, via tracepoints in /sys/kernel/debug/tracing/events/vfio, records all device bind/unbind operations, DMA transactions, and MMIO accesses. Userspace tools (QEMU, libvirt) write to syslog and their own log files for errors such as failure to open an IOMMU group (/dev/vfio/N), insufficient MSI-X vectors, or AER conflicts. This allows tracking why a device fails group validation (non-ACS topology).

Limitations

The main limitations are: the device must be in its own IOMMU group (otherwise all devices in the group are passed through together, for example the audio codec of a graphics card). There is no live migration with device state preservation (no specification for saving PCI device state at the VFIO level). It is also impossible to use passthrough if the host driver is not unbound (driver_override), and some GPUs require a reset (FLR or bus reset) which is not supported by all chips.

History and development

The history of VFIO-PCI began in 2012 with patches from Alex Williamson, replacing legacy KVM PCI-assign and Xen pciback. By 2016 it became the main mechanism in QEMU. Subsequent versions added support for the VFIO migration region for some devices (cloud scenarios), live PCIe rescans, vGPU support via the mdev interface, and performance optimizations for iommufd in kernel 6.2 and later, allowing IOMMU management to be shared among non-KVM components.