VFIO-mdev (Splitting a device into virtual fragments)

VFIO-mdev allows a physical device such as a GPU to be split into several logical parts. Each such part is given directly to different virtual machines, bypassing the host but going through a shared driver.

This technology is in demand in virtual desktop infrastructure VDI environments and cloud platforms where hardware acceleration for graphics or GPGPU is required. Examples include NVIDIA GRID vGPU, Intel GVT-g, and specialised devices such as NVMe drives and FPGAs.

Typical issues

The main difficulty is dependency on hardware support and proprietary firmware. Errors occur when assigning memory mediators, lack of interrupts, or conflicts with the host native driver. Increased latency when switching virtual functions is also observed.

How VFIO-mdev works

VFIO-mdev extends standard VFIO, which provides secure device access via IOMMU but without the ability to split. Unlike regular VFIO one device per one VM or SR-IOV hardware virtual functions, mdev implements a mediator, a software-hardware layer in the kernel. The physical device driver creates mdev pseudo-devices in sysfs. For each such pseudo-device, the user assigns its own set of memory pages and interrupts via the VFIO interface. The host driver intercepts and routes DMA requests, applying constraints set by the mediator. Unlike emulation QEMU with virtio, mdev does not emulate registers, it redirects real ones but with filtering. Compared to SR-IOV, there are no requirements for hardware isolation at the PCIe level, which expands the range of supported devices but adds context switching overhead.

Functionality of VFIO-mdev

  1. Operational mode of VFIO-mdev. VFIO-mdev extends standard VFIO by allowing a physical device driver to export several virtual PCI devices. These pseudo-devices are not tied to separate SR-IOV functions but are created programmatically through a mediator driver, offering flexibility for non‑standard hardware architectures.
  2. Mediator driver mdev driver. For each physical device, an intermediary driver is created that implements operations for creating, deleting and configuring mdev devices. It manages the splitting of hardware resources such as video memory, command queues or I/O registers among virtual instances.
  3. Creating an mdev device. An administrator writes a string with the device type e.g. nvidia-xxx into a special sysfs file /sys/class/mdev_bus/.../mdev_supported_types/xxx/create. In response, the kernel creates a new device in the $DEVPATH/mdev_devices/ directory with a unique GUID.
  4. Deep isolation via IOMMU. Each mdev device is bound to a separate IOMMU domain. Even though the physical device is single, hardware memory address translation and interrupts isolate access by different guests. A guest cannot exceed its allocated region of device memory.
  5. Typical configurations mtty. sysfs provides a directory mdev_supported_types where each subdirectory describes a specific profile: number of virtual functions, amount of video memory, maximum number of instances. For example, available_instances shows how many more devices can be created.
  6. Binding VFIO to mdev. After creation, the mdev device appears as a regular VFIO‑compatible object. A user process opens it via /dev/vfio/N, obtains a file descriptor and executes ioctl VFIO_GROUP_GET_DEVICE_FD just as for a regular PCI device.
  7. Handling interrupts. Mdev uses the VFIO EventFD mechanism. The mediator driver registers a callback for hardware interrupts, converting them into a signal via eventfd into the guest address space. This provides low latency without losing throughput.
  8. Memory management and pinning. The guest performs DMA only into its own region of host physical memory. The mdev driver intercepts VFIO_IOMMU_MAP_DMA calls, pins memory pages via get_user_pages and programs the hardware IOMMU/TLB for that virtual instance.
  9. Resource quotas and QoS. The mediator driver can limit DMA bandwidth, command processing rate or interrupt frequency. Parameters are set via sysfs when creating the mdev. For example, for a GPU you can allocate 50% of compute blocks with hardware multiplexing.
  10. Lifecycle and hot unplug. An mdev device is removed by writing 1 to the remove file inside its directory. The kernel revokes all VFIO descriptors, disallows new DMA mappings and notifies processes via the VFIO_GROUP_NOTIFY_SET_KVM event. A guest can survive such removal.
  11. Compatibility with KVM. Integration via KVM API: the mdev device registers its kvm_device_ops. This allows using kvm_set_irq_routing and kvm_irqfd for fast MSI‑X interrupt delivery. KVM directly calls mdev handlers during MMIO/PIO operations without switching to userspace.
  12. MMIO traps. Guest access to configuration space and BARs is intercepted. The mdev driver emulates status registers, redirecting writes to actual hardware control. For example, a queue reset bit is translated into resetting the corresponding virtual channel.
  13. Absence of SR-IOV. Mdev does not require SR-IOV support in hardware. It works with any device whose driver implements the mdev_parent_ops interface. This is a key difference: virtualisation is created at the driver level, not via a PCI bridge, which is useful for GPUs, accelerators and FPGAs.
  14. Virtual functions and migration. Some mdev drivers support vfio_mdev_ops with save_state and load_state methods. This allows suspending the virtual device, serialising its internal state such as a command queue, and restoring it on another host without rebooting the guest.
  15. Security and DMA attacks. Since each mdev runs in its own IOMMU domain, a guest cannot DMA‑attack the memory of other guests. However, the physical device itself may have flaws in its sharing logic – the mdev driver must reliably isolate status registers. Responsibility lies with the driver implementation.
  16. Integration with cgroups. Binding an mdev device to a process via VFIO automatically includes memory accounting in cgroup. All pinned pages are counted against memory.kmem.pages_limit. The host I/O scheduler can limit bandwidth via blocking operations inside the mdev driver.
  17. Example: vfio-mdev for graphics. The NVIDIA vGPU driver uses mdev to create up to 8 virtual GPUs on a single physical one. Each is assigned its own video memory size 1G, 2G, 4G and a number of multiprocessors. Guests see a full PCI device with the NVIDIA driver.
  18. Example: media accelerators. Intel QuickAssist QAT via mdev splits the cryptographic engine into 16 virtual devices. The guest sends requests via a DMA queue, and the mdev driver multiplexes them into the hardware queue, ensuring even load distribution.
  19. Debugging via tracepoints. The kernel provides events: mdev_create, mdev_remove, mdev_ioctl, mdev_mmap. Enabling them via ftrace allows tracking which operations the guest performs and how the mediator driver responds. This is critical when developing a new mdev driver.
  20. Performance limitations. Mdev does not add significant DMA overhead direct path via IOMMU, but MMIO emulation can become a bottleneck if the guest frequently reads status registers. For such scenarios, the mdev driver may cache states or use shared memory.
  21. Outlook and replacement of VFIO-PCI. Mdev is gradually replacing static partitioning via VFIO-PCI plus SR-IOV in areas where hardware virtualisation is not available. The VFIO-mdev standard is included in the mainline Linux kernel starting with version 4.10, and the number of supporting drivers is growing.
  22. VFIO-PCI (Isolating PCI devices for virtualization)

Comparison of VFIO-mdev with similar features

  • VFIO-mdev vs SR-IOV. VFIO-mdev allows splitting a single physical device into several virtual ones via a mediator driver, while SR-IOV requires hardware support for PF/VF and rigid resource partitioning. Mdev is more flexible for non‑standard devices GPU, NPU but incurs more CPU overhead, whereas SR-IOV provides native performance and isolation in virtualised environments.
  • VF (Hardware I/O virtualization mechanism)PF (Hardware virtualization of Input-Output devices)
  • VFIO-mdev vs VirtIO. VirtIO is paravirtualisation focused on high throughput via shared ring buffers but without direct access to real device addresses. VFIO-mdev gives the guest direct access to physical functions via IOMMU, reducing data transfer latency. However, mdev requires support from the device driver, while VirtIO is universal for networks and disks.
  • VFIO-mdev vs VFIO-passthrough. VFIO-passthrough gives an entire physical device to a single virtual machine without splitting, providing maximum performance and isolation. VFIO-mdev splits one device among several guests, which is useful for GPUs and accelerators. The downside of mdev is a more complex management model and potential bottlenecks in the mediator, especially under heavy I/O.
  • VFIO-mdev vs vDPA. vDPA vhost Data Path Acceleration combines a virtio interface with direct access to a hardware accelerator, bypassing the hypervisor mediator. VFIO-mdev works at the IOMMU group level and requires a mediator driver for each device type. vDPA is better for low‑latency network devices, while mdev is preferable for complex logic such as splitting GPUs among processes.
  • VFIO-mdev vs Xen PV Passthrough. Xen PV Passthrough uses paravirtual drivers in the U domain to access a real device via shared memory with the hypervisor, which reduces performance due to context switches. VFIO-mdev, running under KVM, leverages hardware virtualisation and IOMMU to achieve near‑native speed. However, Xen isolates faulty drivers better, whereas mdev requires a trusted mediator from the manufacturer.
  • PV (Virtual machine I/O acceleration)

OS and driver support

VFIO-mdev requires support at the Linux kernel level starting from version 4.10, where mediator drivers such as vfio-mdev register a parent device, and device specific drivers like i915-GVTg, NVIDIA vGPU, and AMD SR-IOV create virtual instances through sysfs interfaces. User space components such as QEMU and KVMTool then open /dev/vfio/<device> to directly assign the mdev device to a virtual machine using the VFIO API.

Security

Security is built on hardware isolated contexts: the mdev driver uses IOMMU for strict translation of guest DMA addresses, preventing access to host memory or that of other VMs. It also manages interrupt descriptors like MSI-X through the shared VFIO bus, but each mdev copy operates within its own protected domain via pci-stub or vfio-pci without direct access to the physical device’s global resources, except for factory gateways controlled by the hypervisor.

Logging

VFIO-mdev logging is implemented at three levels: kernel level using ftrace and tracepoints vfio_mdev_mmap, vfio_mdev_ioctl, and vfio_mdev_read/write to track DMA operations and registers; mediator driver level using printk or dev_dbg in the driver such as mdev_parent_dev to log instance creation, destruction, and BAR access errors; and user level via sysfs files like mdev_supported_types/<type>/available_instances and log files for isolation violation statistics collected by libvirt or specialized daemons.

Limitations

Key limitations include: the inability to perform live migration for most mdev devices due to the lack of a standardized mechanism for saving and restoring the internal state of GPUs or accelerators at the VFIO level; lack of support for resetting individual mdev instances without resetting the entire physical device, for example Intel GVT-g requires a full GPU reset; dependence on specific host hardware models and firmware versions, as SR-IOV must be explicitly enabled in the BIOS and mdev types must be exported by the driver; and limited bandwidth for BAR memory operations due to the shared bus to the physical device, leading to latency under heavy concurrent access.

History and development

The technology emerged as an extension of classic VFIO for virtualizing entire GPUs and accelerators, starting with patches from NVIDIA and Intel in 2015-2016. It was officially merged into the Linux 4.10 kernel with the mdev core mechanism and the first driver vgpu_sim. It then evolved with the addition of i915 GVT-g in 2017, support for vfio-mdev for media devices via mediatek in 2018, and the introduction of mdev_parent_ops with a compound region interface. In modern kernels, work continues on unifying VFIO-mdev with the vfio-pci core and adding partial live migration support through custom driver methods using migration region ops.