What is VFIO (Direct device Input-Output virtualization)

VFIO (Virtual Function I/O) is a Linux mechanism that allows safely transferring control of a real device (such as a graphics card) from the host system directly to a guest OS, bypassing the hypervisor, to achieve native performance.

VFIO is used in virtualization environments with demanding graphics or compute workloads: gaming virtual machines, GPU-accelerated scientific computing, professional workstations with NVMe controller or sound card passthrough. The technology is indispensable for cloud providers allocating physical GPUs to tenants.

Typical Issues

The main difficulty is IOMMU group fragmentation, where a device cannot be detached separately from its associated devices (for example, a USB controller on the same graphics card). Full processor support for AMD-Vi or Intel VT-d is required. Conflicts often arise with an initialized host driver, causing system hangs.

IOMMU (Isolation of direct memory access addresses)

How It Works

VFIO relies on the capabilities of the IOMMU module (kernel architecture). The host driver first detaches from the target device, after which the device is bound to the vfio-pci driver. At this moment, userspace gains direct control over the PCI configuration space, BAR memory regions, and interrupt queues through a special file descriptor /dev/vfio/n. Unlike emulation (e.g., QEMU with -device e1000) where each operation is fully simulated by the CPU, or paravirtualization (VirtIO) requiring guest drivers, VFIO does not emulate logic; it simply remaps physical DMA addresses. The difference from legacy KVM PCI passthrough (pci-assign) is enhanced IOMMU table checking that prevents DMA attacks on host memory. Thanks to this, the guest interacts with the hardware directly, while the hypervisor only routes interrupts and checks memory access boundaries.

VFIO-PCI (Isolating PCI devices for virtualization)

VFIO functionality

PCI device assignment. VFIO is a Linux kernel framework that provides secure access to PCI devices from userspace. It uses IOMMU mechanisms to isolate DMA operations, preventing the device from accessing unauthorized memory regions. This is critically important for virtualization.
IOMMU groups. The framework operates with the concept of an IOMMU group — a set of devices that cannot be separated due to hardware limitations. All devices in a single group must be assigned to the same virtual machine. The administrator must check groups via sysfs (e.g., /sys/kernel/iommu_groups/).
vfio-pci driver. The main interface is the vfio-pci driver, which unbinds the device from its native driver (e.g., igb or nvidia) and binds the device to VFIO. After that, the device is managed by a userspace process through the /dev/vfio/N file descriptor.
Device binding. Binding is done using the driver_override command or driverctl scripts. Example: echo "vfio-pci" > /devices/.../driver_override. Then the device is unbound and rebound. Incorrect binding can lead to system crashes.
/dev/vfio/vfio interface. The global control file /dev/vfio/vfio is used to open a VFIO container. A container combines multiple IOMMU groups into a single address space. This allows assigning several devices to one VM without conflicts.
Opening a group. To access a group, the process calls open("/dev/vfio/N") where N is the IOMMU group number. After obtaining the file descriptor, it issues the ioctl VFIO_GROUP_GET_DEVICE_FD to get a file descriptor for a specific PCI device.
Device information query. Through the ioctl VFIO_DEVICE_GET_REGION_INFO, the program obtains information about the device’s BAR (Base Address Registers): size, offset, and access flags. Similarly, VFIO_DEVICE_GET_IRQ_INFO is used to query interrupt data.
BAR region mapping. BAR regions are mapped into the process’s memory using mmap on the device file descriptor. This provides direct access to the PCI configuration space and control registers with minimal overhead. Access must be aligned.
Interrupt handling. VFIO supports three interrupt types: MSI-X, MSI, and legacy INTx. To use MSI-X, the process calls ioctl VFIO_DEVICE_SET_IRQS specifying the vectors. Interrupts are delivered via eventfd, which is then read in a processing loop.
DMA management. To allow DMA operations, the process registers physical memory regions via ioctl VFIO_IOMMU_MAP_DMA. The kernel programs the IOMMU tables, mapping guest virtual addresses to host physical pages. Without this operation, the device cannot read or write data.
Isolation and security. The key advantage of VFIO is strong isolation. Even if the device misbehaves (e.g., sends DMA to random addresses), the IOMMU will block access outside the allowed regions. This prevents host crashes from a broken guest driver.
Virtual function support. VFIO is fully compatible with SR-IOV. The administrator can assign a virtual function (VF) of a physical device directly to a VM. To do this, the VF is first bound to vfio-pci just like a regular physical device. All properties are inherited.
SR-IOV (Hardware-level input-output device virtualization)VF (Hardware I/O virtualization mechanism)
Transparent resource passthrough. Using VFIO_DEVICE_PCI_HOT_RESET, the guest can initiate a hot reset of the PCI device. The framework correctly handles the reset, reprogramming the IOMMU and notifying the process via eventfd. This is necessary for proper operation after a device hang.
Graphics adapter passthrough. For GPUs with ROM, access to the Expansion ROM option is needed. VFIO provides the VFIO_PCI_ROM_REGION_INDEX region through which the firmware is loaded. Additionally, a patch for vfio-pci is required to allow GPU reset (e.g., for NVIDIA Tesla or AMD Radeon).
Device limitations. Some devices do not support Function Level Reset (FLR). On a subsequent VM start without host reboot, the device may remain in an undefined state. The solution is using vfio-pci with the reset_method=bus option or power cycling via the PCIe connector.
Using VFIO with QEMU. QEMU is the primary user of VFIO. On the command line, the parameter -device vfio-pci,host=DD:BB:F is used. QEMU automatically opens IOMMU groups, sets up DMA maps, and handles interrupts via eventfd, linking them to the guest’s virtual interrupt lines.
Diagnostics via sysfs. VFIO status is monitored through debug points in /sys/kernel/debug/vfio. There you can view current DMA maps, bound groups, and interrupt counters. The output of lspci -v is also useful to confirm that the device driver is vfio-pci.
IOMMU boot configuration. To activate VFIO, kernel parameters must include intel_iommu=on (for Intel) or amd_iommu=on (for AMD). Additionally, iommu=pt is recommended to enable passthrough mode, which reduces translation overhead.
Locked memory pool. All pages involved in DMA must be locked in RAM to prevent swapping. The process must increase ulimit -l and use mlockall(). QEMU does this automatically when the parameter -realtime mlock=on is enabled.
Comparison with UIO. UIO (Userspace I/O) does not use IOMMU and is therefore unsafe for device isolation. VFIO, unlike UIO, provides hardware-level DMA checking and allows MSI-X interrupt assignment, making it the only industrial solution for virtualizing high-performance devices.

Comparison of VFIO with similar features

VFIO vs SR-IOV. VFIO provides userspace with secure access to PCIe devices via IOMMU, while SR-IOV is a hardware method for creating virtual functions (VFs) from a single physical device. VFIO manages assigning whole devices or VFs to virtual machines, whereas SR-IOV is a partitioning mechanism with which VFIO often integrates for high performance.
VFIO vs UIO (Userspace I/O). UIO is simpler to implement but does not use IOMMU, so DMA from the guest OS can corrupt host memory. VFIO requires IOMMU, providing isolation and security for VFIO-PCI or VFIO-MDEV, which is critical for cloud environments. UIO is only suitable for trusted drivers or debugging.
VFIO-mdev (Splitting a device into virtual fragments)
VFIO vs KVM Device Assignment (without VFIO). The old device assignment in KVM via QEMU without VFIO relied on direct PCI access but did not guarantee secure DMA. VFIO with VFIO-PCI implements all checks via IOMMU and group APIs, preventing attacks using missed interrupts or configuration space accesses.
VFIO vs vhost-user. VFIO is designed for direct device assignment (e.g., GPU, NVMe), providing full hardware control. vhost-user operates at the level of virtqueues, allowing data exchange via shared memory without context switching, but does not provide access to real PCIe devices, offering more flexible partitioning in network and block subsystems.
VFIO vs vfio-mdev (mediated device). VFIO requires assigning the entire device to a single VM. vfio-mdev allows the host driver to partition a single physical device into multiple mediated devices (mdevs), each of which can be assigned to different VMs via the VFIO interface. This is a compromise between VFIO performance and SR-IOV flexibility without hardware support.

OS and driver support

VFIO requires a Linux host kernel with vfio-pci, vfio_iommu_type1 modules enabled and IOMMU support (AMD-Vi/VT-d). The guest OS runs with unmodified drivers, gaining direct PCIe device assignment through a virtual function where the host driver is unbound (pci-stub or vfio-pci), and the guest receives real BARs, MSI-X, and DMA via IOMMU page translations.

Security

VFIO security is based on IOMMU hardware isolation: every DMA operation from a device is checked for address range validity for the given domain, and on violation a hardware fault is generated without host memory access. Additional measures include iommu=pt for passthrough only in trusted configurations, vfio_iommu_type1.dma_map_limit to prevent DoS, and IOMMU group quotas to prevent device spoofing with unrelated devices.

Logging

VFIO logs via ftrace and tracefs for device binding, DMA mapping, IOMMU protection violations, and interrupt errors. In dmesg the vfio-pci driver outputs information about device capture, IOMMU version, translated page count, and fatal failures. User utilities like qemu with -trace can log all VFIO_GROUP_GET_DEVICE_FD register operations, container creation, and region mapping.

Limitations

VFIO imposes strict limitations: inability to hot-unplug a device while actively captured without a full PCIe bus reset, requirement for an isolated IOMMU group (if ACS is unavailable the whole group must be passed), lack of guest sleep state support with a passed GPU without special state reset through vfio_pci_core_disable, and problems with reset mechanisms for devices that do not properly implement FLR or PM reset.

History and development

VFIO replaced pci-assign and the KVM module in kernel 3.6 (2012), evolving ideas from UIO and KVM device assignment. In 2015 VFIO Platform was added for embedded systems. In 2018 support for vfio-mdev arrived for mediated media and GPU virtualization. From 2022 to 2024 vFIO over vDPA and a mediation layer for virtual NVMe emerged, along with the speculative iommufd optimization for secure multi-user device passthrough in containers.