What is GPU-PV (GPU Partitioning Between Virtual Machines)

GPU-PV allows the host system to provide a virtual machine with direct but secure access to the graphics processor. Instead of full emulation, the guest driver coordinates with the hypervisor, sharing the resources of a single GPU among multiple VMs almost as if on bare metal.

This technology is in demand in VDI environments (such as Microsoft Azure Virtual Desktop) to accelerate office applications and browsers. It is also used in cloud gaming services and on workstations where multiple lightweight graphics VMs need to run without the cost of full hardware virtualization like SR-IOV.

SR-IOV (Hardware-level input-output device virtualization)

Typical Problems

The main difficulty is incomplete driver compatibility: the NVIDIA or AMD version in the guest OS must strictly match the host version. Memory leaks and crashes may occur during user switching. Additionally, not all DirectX 12 Ultimate features are supported, and under heavy load on one VM, the performance of neighboring VMs suffers due to cache conflicts.

How GPU-PV works

Unlike full emulation (e.g., QEMU with VirtIO-GPU), where the hypervisor intercepts every command and translates it into CPU instructions, GPU-PV works through a shared command buffer. The guest driver builds execution queues (ring buffers), and the hypervisor (Hyper-V or KVM with mdev) remaps video memory through IOMMU without emulating device registers. Unlike hardware passthrough (via VFIO), which dedicates an entire GPU to a single VM, and unlike SR-IOV, which requires special virtual chip functions, GPU-PV dynamically divides one physical adapter at the system call level. This is a compromise: lower overhead than emulation, but higher latency than direct access, plus no power isolation. The mechanism resembles disk paravirtualization: the guest knows it is running under management and uses a simplified exchange protocol through the hypervisor rather than emulated registers.

VirtIO-GPU (Hardware-accelerated virtual GPU)IOMMU (Isolation of direct memory access addresses)VFIO (Direct device Input-Output virtualization)

Key Functions of GPU-PV

Time-sharing model. GPU-PV is based on time-division multiplexing: the hypervisor quantizes GPU time into micro-intervals (typically 1–10 ms) and cyclically assigns them to different VMs. Efficiency formula: U_gpu = Σ(u_i), where u_i is the utilization of the i-th VM, and the sum ≤ 1.
Command transfer mechanism. The VM receives a virtual GPU instance through a paravirtualized frontend driver. Commands from the VM’s command buffer are copied by the hypervisor into the physical GPU ring buffer without repacking, reducing overhead.
Address space management. The hypervisor maintains nested page tables (NPT) for the GPU-MMU. The virtual GPU address in the VM is translated to the physical device address through two levels: GPA → HPA and GVA → GPA with boundary checking.
Memory isolation. IOMMU (Input-Output Memory Management Unit) technology provides hardware isolation of DMA operations for each VM. Protection formula: P = V(i) AND M(i), where V(i) is address validity and M(i) is the ownership of VM i.
Hypervisor scheduler. In Hyper-V (Microsoft), a proprietary scheduler with fixed time slices is used. Parameters: Time Slice (TS) and Switch Period (SP). Throughput: Throughput = Σ(TS_k / SP).
GPU contexts. Each VM owns an isolated execution context, including command queues, status registers, and caches (L2, texture). Context switching requires saving/restoring up to 1 MB on modern GPUs.
Video memory access. GPU-PV does not emulate video memory but allocates physical VRAM regions through BAR (Base Address Register). The virtual VM sees a contiguous range that the hypervisor maps to real pages using mapping tables.
Interrupt handling. Physical GPUs generate MSI-X interrupts. The hypervisor intercepts them, identifies the target VM by context ID, and retransmits them as paravirtualized interrupts with a virtual vector.
Commands with side effects. Some GPU instructions (e.g., cache flush or power mode switch) cannot be virtualized. The hypervisor emulates them via traps and executes them on behalf of the VM after verifying permissions.
Queue access model. The VM’s frontend driver places commands in a ring buffer in memory accessible to the hypervisor. The hypervisor copies them to the physical GPU queue using double buffering. Copy latency: L = L_queue + L_copy.
Fence synchronization. The fence mechanism (completion notification) is virtualized through shadow registers. The hypervisor replaces the physical fence address with a shadow address and, after real execution, generates a signal for the VM. Wait formula: T_wait = max(T_real, T_sched).
Error management. When one VM fails (e.g., GPU hang), the hypervisor isolates its context, resets the GPU’s logical channel, and restarts the driver without crashing other VMs. This technology is available in Nvidia vGPU and Intel GVT-g.
vGPU (Splitting a GPU into virtual devices)
Dynamic quantum adjustment. Advanced implementations support adaptive scheduling: priority VMs receive an increased quantum TS_high = TS_base * (1 + α), where α is the priority coefficient (0…2). Background tasks reduce their share.
Performance monitors. The hypervisor collects counters: number of context switches, average utilization, PCIe bus bandwidth. Efficiency metric: E = (ΣCompleted_operations) / (ΣAllocated_quanta * Max_ops_per_quant).
Buffer exchange protocol. For zero-copy, a shared memory mechanism with descriptors is used. The VM passes a buffer pointer to the hypervisor via a hypercall, and the hypervisor registers it in the IOMMU.
Architectural limitations. GPU-PV does not support all SR-IOV features (Single Root I/O Virtualization) — for example, virtual functions (VF) with separate PCIe space. There is no hardware isolation at the shader cache level.
VF (Hardware I/O virtualization mechanism)
Impact on latency. In the worst case, command latency includes quantum waiting: L_max = L_exec + TS_max. For interactive applications, quanta ≤ 2 ms are critical. Under overload, latency grows quadratically.
Snapshot state. GPU-PV allows saving the VM’s GPU state (queues, command cache) for live migration. The hypervisor takes a snapshot via a helper driver, blocking new command input for about 50 ms per gigabyte of state.
API compatibility. The VM sees the full set of graphics APIs (DirectX 12, Vulkan, OpenGL) without changes to applications. The hypervisor intercepts only low-level commands through the paravirtualized driver; the rest execute natively.

Comparisons

GPU-PV vs GPU Passthrough. GPU-PV shares the physical GPU among multiple VMs through the hypervisor, providing each with virtual access at the cost of scheduling overhead. GPU Passthrough (VFIO) dedicates the entire GPU to a single VM, eliminating virtualization overhead but preventing sharing. GPU-PV wins in density; Passthrough wins in performance and driver compatibility.
GPU-PV vs vGPU (SR-IOV). GPU-PV uses a software GPU emulator inside the hypervisor without hardware sharing support, increasing latency. SR-IOV-based vGPU creates hardware virtual functions (VF) with direct resource access, delivering near-native performance. GPU-PV is simpler to implement, but SR-IOV is preferred for high-load graphics and compute environments.
GPU-PV vs API remoting (e.g., RDS or GPU forwarding). GPU-PV provides the guest with a full GPU driver and virtual device, allowing arbitrary graphics code to run locally on the host. API remoting intercepts calls (e.g., DirectX or CUDA) and forwards them to a separate server for execution — ideal for heterogeneous environments but suffers from network latency and limited API support.
GPU-PV vs Mediated Pass-Through (mdev, e.g., Intel GVT-g). Both technologies share the GPU among VMs, but mdev uses the host driver to create mediated devices with hardware acceleration and memory. GPU-PV fully emulates GPU behavior on the hypervisor, offering better isolation and VM migration, but lags behind mdev in I/O operations and memory management performance.
GPU-PV vs Time-sliced virtualization (KVM vGPU). Time-sliced virtualization cyclically assigns the entire GPU to each VM for micro-intervals, creating the appearance of simultaneous operation without modifying the guest driver. GPU-PV continuously virtualizes command queues and interrupts. Time-slicing is simpler to implement for single-threaded tasks, but GPU-PV is more efficient for parallel compute workloads due to finer resource control.

OS and driver support

GPU-PV requires Windows Server 2022/2025 or Windows 11 (21H2+), and supported Windows guest OSes (10/11) with WDDM 2.9+. For Linux, passthrough is possible via the NVIDIA vGPU plugin or DDA, but host drivers must be version 530+ with support for the paravirtual rendering channel. The guest driver operates through the GPU paravirtualization interface without direct PCIe passthrough, and window manager commands are converted by the host scheduler.

PCIe Passthrough (Direct virtual machine access to PCIe device)WDDM (Windows video subsystem management)

Security

GPU-PV isolation is based on I/O virtualization technology (SR-IOV with virtual functions) plus a hypervisor memory scheduler with TLB. Each virtual GPU gets a separate address space via IOMMU/SMMU, and the host driver provides page swapping. An error in the guest driver cannot read data from another VM because all DMA transactions go through VFIO with buffer boundary checks.

Logging

GPU-PV events are recorded in Event Viewer under Microsoft-Windows-Hyper-V-GPU-PV/Operational for successful attachments and driver versions, and Microsoft-Windows-Hyper-V-GPU-PV/Debug for command timeouts and channel resets. ETW tracing is also available with provider {f4e3b0c4-0b2d-4b5c-9a6e-2d5c3f7e8a9b} for monitoring command buffer queues. If the guest GPU crashes, the host logs the DMA error code and forcibly resets the virtual function.

Limitations

OpenGL 4.5+ is not supported on Linux guests due to the lack of Vulkan command translation into the host driver. The maximum video buffer per VM is limited to one quarter of total VRAM based on host policy. Rebooting the guest does not clear the host shader cache; a full VM restart is required. Live migration is not supported with GPU-PV because the virtual function is bound to a specific physical adapter.

History and development

GPU-PV first appeared in the prerelease version of Windows Server 2016 Hyper-V as Discrete Device Assignment with no resource sharing. True GPU-PV with time slicing arrived in 2020 with Windows Server 2022. In 2023, WDDM 3.0 added support for Cross-Adapter Scan-Out for VR devices. Future versions (Windows Server 2025 R2) are expected to include hardware support for switching virtual functions between VMs and integration with CI/CD pipelines via WMI for dynamic VRAM allocation.