KVM_HC (Hypervisor call from guest to host)

KVM_HC is a mechanism for a direct software request from a virtual machine to the KVM hypervisor. The guest operating system invokes a special instruction to ask the host system to perform a privileged action or to provide additional information that the guest cannot implement on its own due to virtualization limitations.

This mechanism is used in paravirtualized VirtIO drivers and for implementing guest clock sources such as KVM Clock. The guest Linux kernel uses hypercalls to request the precise system time of the host, which is critically important for avoiding clock drift during virtual machine migration. The technology is also employed in data exchange systems between the host and guest without emulating hardware I/O devices.

Typical problems

The main problem lies in handling incorrect or outdated requests from older versions of guest systems. When using interrupt masking via a hypercall, a race condition may occur, leading to signal loss and virtual processor hangs. An improper implementation of the handler on the host side can cause resource leaks or a crash of the entire hypervisor, breaking the isolation between virtual machines.

Principle of KVM_HC operation

The principle of operation is based on trapping the VMCALL instruction on Intel processors or VMMCALL on AMD in non-root ring mode. When the guest operating system executes this instruction with a predefined function number in the RAX register, the processor forces a VM exit, transferring control to the KVM module. The kvm_emulate_hypercall handler in the Linux kernel analyzes the hypercall number and activates the corresponding service function. For example, a KVM_HC_KICK_CPU call wakes up a blocked virtual processor through the inter-processor interrupt mechanism without costly APIC controller emulation. The KVM_HC_CLOCK_PAIRING call allows the guest to atomically obtain the host reference time counter value and the guest TSC for precise synchronization without delay-induced errors. The request arguments are passed through the general-purpose registers RDI, RSI, RDX, and RBX, and the result is returned in RAX before resuming guest code execution. Protection against abuse is ensured by strict validation of argument validity: incorrect values do not cause a host crash but merely return an error code to the guest, preserving the stability of the entire virtualization platform.

KVM_HC functionality

  1. Signature and identification. To perform a hypercall, the guest kernel loads the function number into the RAX register, arguments into RBX, RCX, RDX, RSI, RDI, and then executes VMCALL. The KVM_HC slot number varies, but the base identifier typically starts from the architecturally reserved Vendor Specific range.
  2. Register-based data transfer model. The interface strictly adheres to a register-based model with up to six parameters. The return value is placed in RAX. If a structure needs to be passed, the guest must place the data in physical memory and pass the GPA in registers, guaranteeing their atomicity and the pinned state of the pages during the call.
  3. KVM_HC_CLOCK_PAIRING. This function provides binding of guest timestamps to hardware ones. It returns a pair of values: the TSC and the host monotonic clock readings. This allows the guest to accurately calculate the time offset without paravirtualized clock drivers, reducing drift during virtual machine migration.
  4. TSC binding mechanism. Upon a KVM_HC_CLOCK_PAIRING call, the KVM module atomically captures the TSC counter and the ktime_get_boottime_ns value. The guest receives a cross-correlated synchronization point, which is critical for distributed systems with high requirements for event logging precision at the microsecond level.
  5. KVM_HC_SEND_IPI. The hypercall allows the guest to send inter-processor interrupts without emulating APIC access. The guest passes a bitmap or a list of target vCPUs. The hypervisor analyzes the physical placement of threads, making it possible to implement IPI broadcasting more efficiently than multiple writes to the APIC ICR register.
  6. Broadcast optimization. Instead of generating multiple VM Exits for each APIC write, KVM_HC_SEND_IPI processes the entire vector in a single call. If the target vCPUs are sleeping, KVM directly injects the event into their virtual interrupt controllers, bypassing userspace scheduler delays.
  7. KVM_HC_SCHED_YIELD. This hypercall informs the host scheduler that the executing vCPU has entered a spinlock. In response, the hypervisor can switch the physical core to run another vCPU that is holding the lock, solving the lock holder preemption problem in overcommitted environments.
  8. Resource yield mechanics. Upon receiving KVM_HC_SCHED_YIELD, the KVM kernel calls schedule() for the current vCPU through the CFS subsystem. The IPI parameter allows specifying a particular target vCPU to which control should be transferred, radically reducing the spinlock hold time in nested virtualized environments.
  9. KVM_HC_MAP_GPA_RANGE. This function serves for the paravirtualized management of page tables. The guest sends a request to change the mapping of a GPA to an HPA with specified attributes. This is especially relevant during large page migration, allowing it to bypass costly MMU emulation and direct mapping via EPT/VT-d.
  10. EPT (Hardware second-level memory address translation)
  11. Memory attribute management. Through flags in KVM_HC_MAP_GPA_RANGE, the guest can request the setting of read, write, or execute permissions without triggering a series of VM Exits for each Page Fault. This is the foundation for dynamically changing memory protection in real time without the host parsing guest page tables.
  12. KVM_HC_UNMAP_GPA_RANGE. The mirror operation for deregistering a physical range. The guest informs the hypervisor of its intention to free memory or change its type. This allows the KVM module to invalidate EPT entries immediately, maintaining translation cache coherence without expensive guest-tracking mechanisms.
  13. Memory lifecycle. The MAP and UNMAP pair gives the guest control over hot memory replacement and ballooning without the virtio-balloon driver. The guest can directly instruct the host to return pages, using semantics similar to Xen hypercalls but integrated into native KVM without additional PCI device emulation.
  14. Virtio-balloon (Redistribution of unused guest system memory)
  15. KVM_HC_FEATURES. This hypercall, often issued with a query flag, allows the guest to determine the bitmap of available functions on the given host. The capability negotiation mechanism guarantees backward compatibility: the guest requests support, and the host returns a mask, excluding calls to unavailable functionality.
  16. Version negotiation structure. The KVM_HC_FEATURES query has no side effects. If a bit in the mask is cleared, the guest must use native instructions or MMIO, avoiding fatal errors when migrating to processors with different microcode revisions that support a limited set of hypercalls.
  17. KVM_HC_MEM_ATTRIBUTES. A specialized hypercall for managing memory encryption. The guest specifies the offset and size of a region, switching it between private (encrypted) and shared state. It is used in conjunction with AMD SEV technology to control trusted memory boundaries at the hypervisor level.
  18. Cache coherence during encryption switch. Upon a KVM_HC_MEM_ATTRIBUTES call, the KVM module must perform a full WBINVD synchronization on the C-bit mode switch boundaries. The paravirtual nature of the call allows the guest to avoid the race condition that inevitably arises when attempting to change attributes via non-paravirtual page tables.
  19. Direct event injection. A number of KVM_HC implementations support functions for direct notifications without irqfd queues. This allows the guest to inject events into its own vCPUs, bypassing interrupt controller emulation, which is widely used in high-performance software routers running inside virtual machines.
  20. Error handling. All KVM_HC functions return standardized error codes in RAX. Zero means success. In case of failure, the guest must not crash but must degrade to standard MMIO behavior or CPUID instructions, ensuring survivability when operating on different KVM versions.
  21. Paravirtual debugging. A special class of hypercalls allows the guest to send string messages directly to the host ring buffer (debug log). This is implemented by placing data in shared memory and calling KVM_HC with the address specified, allowing engineers to study guest crashes without a serial port.
  22. Security and isolation. The KVM_HC handler thoroughly validates all passed GPA pointers. Any attempt to access host memory or exceed the bounds of guest physical memory causes immediate termination of the processing with a General Protection Fault injected back into the guest to prevent privilege escalation.

Comparisons

  • KVM_HC vs CPUID. KVM_HC is the primary interface for paravirtual hypercalls, whereas CPUID provides static information about the hypervisor without side effects. KVM_HC initiates active requests to the host with argument passing through general-purpose registers and the return of complex structures, while CPUID is limited to reading fixed leaves and is not suitable for requesting time services or inter-processor interrupts.
  • KVM_HC vs VMCALL/VMMCALL. When comparing hypervisor instructions, KVM_HC encapsulates the low-level VMCALL (Intel) or VMMCALL (AMD) into a standardized API with a unique entry signature. Unlike the direct use of vendor-sensitive instructions, KVM_HC ensures code portability between platforms by adding an abstraction layer and parameter marshaling, preventing collisions with microarchitectural processor commands and third-party hypervisors.
  • KVM_HC vs MSR (Hyper-V) Hypercall. Architecturally, KVM_HC in Linux uses the VMCALL instruction, whereas the Hyper-V hypercall interface is traditionally tied to Model-Specific Registers (MSR) for entry. The KVM approach is simpler to discover and does not require the complex MSR polling loop characteristic of Hyper-V. However, the Hyper-V MSR method is historically more isolated from user code, while KVM_HC is optimized for minimal latency when called from the guest kernel without VMCS context switching.
  • Hyper-V Hypercall (Privileged system call to the hypervisor)VMCS (Virtual Machine control structure)
  • KVM_HC vs Xen hypercall_page. Unlike Xen’s shared code page mechanism, which requires code injection into the guest address space, KVM_HC implements direct control transfer via a register interface without modifying the guest code base. Xen updates instructions based on the ABI version on the fly, whereas KVM uses a static function number in RAX, simplifying validation but making the interface less flexible for binary compatibility.
  • Xen Hypercall (Cross-guest call to Xen hypervisor)
  • KVM_HC vs PSCI (ARM). For ARM architectures, the comparison transforms into a contrast between KVM_HC and the PSCI power management interface. While PSCI is standardized by the UEFI forum strictly for core and sleep state operations, KVM_HC on ARM64 uses the HVC instruction for services beyond the PSCI specification, such as obtaining precise host time or in-guest memory mapping, filling the standard’s gap with KVM’s proprietary functionality without disrupting the SMC security interface.

OS and driver support

Support for KVM_HC hypercalls is implemented through paravirtualized drivers in the guest OS kernel, which on the x86 architecture execute the vmcall (or vmmcall) instruction to pass up to four arguments via registers (rbx, rcx, rdx, rsi) and the hypercall number in rax, while the KVM host handles this guest exit and emulates the requested operation, returning the result in rax. For the PowerPC architecture, a 4-byte opcode and registers R3-R10 are used, and for S390, the diagnose instruction with code 0x500 and the hypercall number in R1 is used. Guest drivers include support for specific functions, such as accelerated inter-processor interrupt (IPI) sending via KVM_HC_SEND_IPI, which passes a destination APIC ID bitmap, supporting up to 128 destinations in 64-bit mode.

Security

The security of KVM_HC is ensured by strict argument validation on the KVM host side, where the kvm_emulate_hypercall handler analyzes the hypercall number and passed values before executing the operation, guaranteeing that the guest cannot gain unauthorized access to the host system’s memory or resources. For hypercalls like KVM_HC_MAP_GPA_RANGE, requests are forwarded to userspace (QEMU) via the KVM_EXIT_HYPERCALL mechanism, where the emulator checks the validity of changing guest physical memory attributes (for example, switching between private and shared memory for encryption). The process includes validation of the address range and guest permissions, as well as checking the KVM_MAP_GPA_RANGE_ENCRYPTED flag to prevent data leakage between protected and unprotected memory domains.

Logging

Logging of KVM_HC operations is implemented at several levels: in the KVM kernel, events are processed through the complete_hypercall_exit callback function, which increments statistics counters (stat) for each hypercall type, allowing tracking of the frequency and types of requests from guests. In QEMU userspace, for the KVM_HC_MAP_GPA_RANGE hypercall, tracing is used via trace_kvm_hc_map_gpa_range, recording the guest physical address, region size, attributes, and flags. During development and debugging, information about hypercalls can be obtained through the KVM_CAP_EXIT_HYPERCALL mechanism, which forces KVM to exit guest execution and pass the hypercall data to userspace, where it can be logged with full context.

Limitations

The main limitation of KVM_HC is the strictly defined set of functions, which cannot be dynamically extended by kernel modules without modifying the core KVM code, although attempts at implementing dynamic registration of hypercall handlers have historically been made. For x86 systems, argument passing is limited to four 64-bit registers (rbx, rcx, rdx, rsi), which imposes a limit on the amount of data transferred in a single call. The KVM_HC_CLOCK_PAIRING hypercall has a strict limitation on supporting only hosts with a TSC timer and on synchronization exclusively with CLOCK_REALTIME (KVM_CLOCK_PAIRING_WALLCLOCK type), returning a KVM_EOPNOTSUPP error if these conditions are not met.

History and development

The development of KVM_HC began with basic paravirtualized operations such as KVM_HC_MMU_OP (now deprecated) for page table management and KVM_HC_VAPIC_POLL_IRQ for accelerated interrupt processing. Over time, the set expanded with specialized calls: KVM_HC_KICK_CPU made it possible to wake sleeping vCPUs by APIC ID to optimize spinlocks, KVM_HC_CLOCK_PAIRING provided precise clock synchronization through the kvm_clock_pairing structure containing seconds, nanoseconds, and the TSC value. In modern versions, KVM_HC_SCHED_YIELD was added for cooperative control transfer during vCPU preemption, and KVM_HC_MAP_GPA_RANGE, handled via KVM_CAP_EXIT_HYPERCALL in userspace for supporting memory encryption (SNP/TDX) and migration.