VFIO-mdev allows a physical device such as a GPU to be split into several logical parts. Each such part is given directly to different virtual machines, bypassing the host but going through a shared driver.
This technology is in demand in virtual desktop infrastructure VDI environments and cloud platforms where hardware acceleration for graphics or GPGPU is required. Examples include NVIDIA GRID vGPU, Intel GVT-g, and specialised devices such as NVMe drives and FPGAs.
Typical issues
The main difficulty is dependency on hardware support and proprietary firmware. Errors occur when assigning memory mediators, lack of interrupts, or conflicts with the host native driver. Increased latency when switching virtual functions is also observed.
How VFIO-mdev works
VFIO-mdev extends standard VFIO, which provides secure device access via IOMMU but without the ability to split. Unlike regular VFIO one device per one VM or SR-IOV hardware virtual functions, mdev implements a mediator, a software-hardware layer in the kernel. The physical device driver creates mdev pseudo-devices in sysfs. For each such pseudo-device, the user assigns its own set of memory pages and interrupts via the VFIO interface. The host driver intercepts and routes DMA requests, applying constraints set by the mediator. Unlike emulation QEMU with virtio, mdev does not emulate registers, it redirects real ones but with filtering. Compared to SR-IOV, there are no requirements for hardware isolation at the PCIe level, which expands the range of supported devices but adds context switching overhead.
Functionality of VFIO-mdev
- Operational mode of VFIO-mdev. VFIO-mdev extends standard VFIO by allowing a physical device driver to export several virtual PCI devices. These pseudo-devices are not tied to separate SR-IOV functions but are created programmatically through a mediator driver, offering flexibility for non‑standard hardware architectures.
- Mediator driver mdev driver. For each physical device, an intermediary driver is created that implements operations for creating, deleting and configuring mdev devices. It manages the splitting of hardware resources such as video memory, command queues or I/O registers among virtual instances.
- Creating an mdev device. An administrator writes a string with the device type e.g.
nvidia-xxxinto a special sysfs file/sys/class/mdev_bus/.../mdev_supported_types/xxx/create. In response, the kernel creates a new device in the$DEVPATH/mdev_devices/directory with a unique GUID. - Deep isolation via IOMMU. Each mdev device is bound to a separate IOMMU domain. Even though the physical device is single, hardware memory address translation and interrupts isolate access by different guests. A guest cannot exceed its allocated region of device memory.
- Typical configurations mtty. sysfs provides a directory
mdev_supported_typeswhere each subdirectory describes a specific profile: number of virtual functions, amount of video memory, maximum number of instances. For example,available_instancesshows how many more devices can be created. - Binding VFIO to mdev. After creation, the mdev device appears as a regular VFIO‑compatible object. A user process opens it via
/dev/vfio/N, obtains a file descriptor and executes ioctlVFIO_GROUP_GET_DEVICE_FDjust as for a regular PCI device. - Handling interrupts. Mdev uses the VFIO EventFD mechanism. The mediator driver registers a callback for hardware interrupts, converting them into a signal via eventfd into the guest address space. This provides low latency without losing throughput.
- Memory management and pinning. The guest performs DMA only into its own region of host physical memory. The mdev driver intercepts
VFIO_IOMMU_MAP_DMAcalls, pins memory pages viaget_user_pagesand programs the hardware IOMMU/TLB for that virtual instance. - Resource quotas and QoS. The mediator driver can limit DMA bandwidth, command processing rate or interrupt frequency. Parameters are set via sysfs when creating the mdev. For example, for a GPU you can allocate 50% of compute blocks with hardware multiplexing.
- Lifecycle and hot unplug. An mdev device is removed by writing
1to theremovefile inside its directory. The kernel revokes all VFIO descriptors, disallows new DMA mappings and notifies processes via theVFIO_GROUP_NOTIFY_SET_KVMevent. A guest can survive such removal. - Compatibility with KVM. Integration via KVM API: the mdev device registers its
kvm_device_ops. This allows usingkvm_set_irq_routingandkvm_irqfdfor fast MSI‑X interrupt delivery. KVM directly calls mdev handlers during MMIO/PIO operations without switching to userspace. - MMIO traps. Guest access to configuration space and BARs is intercepted. The mdev driver emulates status registers, redirecting writes to actual hardware control. For example, a queue reset bit is translated into resetting the corresponding virtual channel.
- Absence of SR-IOV. Mdev does not require SR-IOV support in hardware. It works with any device whose driver implements the
mdev_parent_opsinterface. This is a key difference: virtualisation is created at the driver level, not via a PCI bridge, which is useful for GPUs, accelerators and FPGAs. - Virtual functions and migration. Some mdev drivers support
vfio_mdev_opswithsave_stateandload_statemethods. This allows suspending the virtual device, serialising its internal state such as a command queue, and restoring it on another host without rebooting the guest. - Security and DMA attacks. Since each mdev runs in its own IOMMU domain, a guest cannot DMA‑attack the memory of other guests. However, the physical device itself may have flaws in its sharing logic – the mdev driver must reliably isolate status registers. Responsibility lies with the driver implementation.
- Integration with cgroups. Binding an mdev device to a process via VFIO automatically includes memory accounting in cgroup. All pinned pages are counted against
memory.kmem.pages_limit. The host I/O scheduler can limit bandwidth via blocking operations inside the mdev driver. - Example: vfio-mdev for graphics. The NVIDIA vGPU driver uses mdev to create up to 8 virtual GPUs on a single physical one. Each is assigned its own video memory size 1G, 2G, 4G and a number of multiprocessors. Guests see a full PCI device with the NVIDIA driver.
- Example: media accelerators. Intel QuickAssist QAT via mdev splits the cryptographic engine into 16 virtual devices. The guest sends requests via a DMA queue, and the mdev driver multiplexes them into the hardware queue, ensuring even load distribution.
- Debugging via tracepoints. The kernel provides events:
mdev_create,mdev_remove,mdev_ioctl,mdev_mmap. Enabling them via ftrace allows tracking which operations the guest performs and how the mediator driver responds. This is critical when developing a new mdev driver. - Performance limitations. Mdev does not add significant DMA overhead direct path via IOMMU, but MMIO emulation can become a bottleneck if the guest frequently reads status registers. For such scenarios, the mdev driver may cache states or use shared memory.
- Outlook and replacement of VFIO-PCI. Mdev is gradually replacing static partitioning via VFIO-PCI plus SR-IOV in areas where hardware virtualisation is not available. The VFIO-mdev standard is included in the mainline Linux kernel starting with version 4.10, and the number of supporting drivers is growing.
- VFIO-PCI (Isolating PCI devices for virtualization)
Comparison of VFIO-mdev with similar features
- VFIO-mdev vs SR-IOV. VFIO-mdev allows splitting a single physical device into several virtual ones via a mediator driver, while SR-IOV requires hardware support for PF/VF and rigid resource partitioning. Mdev is more flexible for non‑standard devices GPU, NPU but incurs more CPU overhead, whereas SR-IOV provides native performance and isolation in virtualised environments.
- VF (Hardware I/O virtualization mechanism)PF (Hardware virtualization of Input-Output devices)
- VFIO-mdev vs VirtIO. VirtIO is paravirtualisation focused on high throughput via shared ring buffers but without direct access to real device addresses. VFIO-mdev gives the guest direct access to physical functions via IOMMU, reducing data transfer latency. However, mdev requires support from the device driver, while VirtIO is universal for networks and disks.
- VFIO-mdev vs VFIO-passthrough. VFIO-passthrough gives an entire physical device to a single virtual machine without splitting, providing maximum performance and isolation. VFIO-mdev splits one device among several guests, which is useful for GPUs and accelerators. The downside of mdev is a more complex management model and potential bottlenecks in the mediator, especially under heavy I/O.
- VFIO-mdev vs vDPA. vDPA vhost Data Path Acceleration combines a virtio interface with direct access to a hardware accelerator, bypassing the hypervisor mediator. VFIO-mdev works at the IOMMU group level and requires a mediator driver for each device type. vDPA is better for low‑latency network devices, while mdev is preferable for complex logic such as splitting GPUs among processes.
- VFIO-mdev vs Xen PV Passthrough. Xen PV Passthrough uses paravirtual drivers in the U domain to access a real device via shared memory with the hypervisor, which reduces performance due to context switches. VFIO-mdev, running under KVM, leverages hardware virtualisation and IOMMU to achieve near‑native speed. However, Xen isolates faulty drivers better, whereas mdev requires a trusted mediator from the manufacturer.
- PV (Virtual machine I/O acceleration)
OS and driver support
VFIO-mdev requires support at the Linux kernel level starting from version 4.10, where mediator drivers such as vfio-mdev register a parent device, and device specific drivers like i915-GVTg, NVIDIA vGPU, and AMD SR-IOV create virtual instances through sysfs interfaces. User space components such as QEMU and KVMTool then open /dev/vfio/<device> to directly assign the mdev device to a virtual machine using the VFIO API.
Security
Security is built on hardware isolated contexts: the mdev driver uses IOMMU for strict translation of guest DMA addresses, preventing access to host memory or that of other VMs. It also manages interrupt descriptors like MSI-X through the shared VFIO bus, but each mdev copy operates within its own protected domain via pci-stub or vfio-pci without direct access to the physical device’s global resources, except for factory gateways controlled by the hypervisor.
Logging
VFIO-mdev logging is implemented at three levels: kernel level using ftrace and tracepoints vfio_mdev_mmap, vfio_mdev_ioctl, and vfio_mdev_read/write to track DMA operations and registers; mediator driver level using printk or dev_dbg in the driver such as mdev_parent_dev to log instance creation, destruction, and BAR access errors; and user level via sysfs files like mdev_supported_types/<type>/available_instances and log files for isolation violation statistics collected by libvirt or specialized daemons.
Limitations
Key limitations include: the inability to perform live migration for most mdev devices due to the lack of a standardized mechanism for saving and restoring the internal state of GPUs or accelerators at the VFIO level; lack of support for resetting individual mdev instances without resetting the entire physical device, for example Intel GVT-g requires a full GPU reset; dependence on specific host hardware models and firmware versions, as SR-IOV must be explicitly enabled in the BIOS and mdev types must be exported by the driver; and limited bandwidth for BAR memory operations due to the shared bus to the physical device, leading to latency under heavy concurrent access.
History and development
The technology emerged as an extension of classic VFIO for virtualizing entire GPUs and accelerators, starting with patches from NVIDIA and Intel in 2015-2016. It was officially merged into the Linux 4.10 kernel with the mdev core mechanism and the first driver vgpu_sim. It then evolved with the addition of i915 GVT-g in 2017, support for vfio-mdev for media devices via mediatek in 2018, and the introduction of mdev_parent_ops with a compound region interface. In modern kernels, work continues on unifying VFIO-mdev with the vfio-pci core and adding partial live migration support through custom driver methods using migration region ops.