What is KVM (Turns the Linux kernel into a hypervisor)

KVM (Kernel-based Virtual Machine) is a subsystem of the Linux kernel that allows running full-fledged virtual machines. In essence, it adds to the kernel the ability to directly use hardware virtualization extensions of the processor (Intel VT-x or AMD-V), turning the Linux system itself into a lightweight hypervisor for guest operating systems.

Intel VT-x (Hardware Virtualization of the CPU)AMD-V (Hardware virtualization using the processor)

The technology underpins the vast majority of public and private cloud infrastructures, including OpenStack and Google Cloud. It is used for server consolidation in data centers, isolation of untrusted applications in corporate environments, and running containers with an elevated security level wrapped in lightweight virtual machines. KVM is also indispensable for developing and testing code across different architectures.

Typical problems

The most common difficulty is a conflict with other hypervisors that have already seized the hardware virtualization instructions of the processor. Network configuration errors often lead to the isolation of guest systems or overload of the physical interface. A significant problem is memory overcommitment, where, during resource shortages, the ballooning and compression mechanism can cause sharp performance degradation of all running machines on the host.

How KVM works

KVM operation begins with loading the kernel modules kvm.ko and kvm-intel.ko or kvm-amd.ko, after which the Linux operating system itself becomes a virtual machine monitor. Each virtual machine is represented in user space as a standard QEMU process, however, guest code execution is carried out not by emulation but by direct access to the protection rings of the physical processor. When the guest system attempts to execute a privileged instruction, a hardware VM Exit occurs, transferring control to the KVM module in the host kernel. This module analyzes the reason for the exit and either emulates the missing hardware or redirects the request to user space to the QEMU process responsible for simulating specific input-output devices. Guest memory is managed through extended page tables (EPT in Intel processors or NPT in AMD), which allows the guest OS to work with its own virtual address space without hypervisor involvement in every access. To accelerate input-output, a specialized paravirtual driver virtio is used, which creates a shared ring buffer queue between the host and guest, eliminating slow emulation of real hardware and providing near-native speed of data exchange with disk and network.

QEMU (Emulator and hardware virtualizer of a computer)NPT (Second-level address translation for virtualization)EPT (Hardware second-level memory address translation)

KVM functionality

Hypervisor Architecture and Kernel Module. KVM is implemented as a loadable Linux kernel module (kvm.ko) that turns the operating system kernel into a hypervisor running directly on bare metal. The module adds the /dev/kvm device file, through which user space interacts with virtualization capabilities, exporting ioctl() system calls for managing the life cycle of guest machines.
Hardware-Accelerated Virtualization. The functioning of KVM is inseparably linked to hardware extensions of the central processor, such as Intel VT-x or AMD-V. These technologies provide VMX (Virtual Machine eXtensions) and SVM (Secure Virtual Machine) modes, introducing privileged instructions VMLAUNCH and VMRESUME for direct processor switching between host root mode and guest non-root mode without emulation.
VMLAUNCH (Launching a guest virtual machine)VMRESUME (Resuming a suspended virtual machine)SVM (Full hardware isolation of virtual machines)
Virtual Memory Subsystem. Guest address translation is implemented through a two-level page table. Intel hardware uses EPT (Extended Page Tables) technology, and AMD uses NPT (Nested Page Tables). This subsystem maps guest physical addresses to host physical addresses, allowing the guest OS to manage its own virtual memory without hypervisor involvement in critical execution paths.
I/O Emulation with QEMU. The user space component, typically QEMU, uses /dev/kvm to create a virtual machine, allocate guest memory, and handle input-output instructions. QEMU emulates various hardware devices such as network cards, disk controllers, and graphics, redirecting their operations through host system calls, while KVM handles privileged processor instructions.
Virtio Technology and Paravirtualization. To minimize I/O emulation overhead, the virtio framework is used. It defines a standardized transport protocol and data structures (virtqueues) between the guest and host. The guest system loads a special virtio driver that is aware it is running in a virtual environment and interacts with the KVM backend directly through shared memory, avoiding emulation traps.
Interrupt Handling via Virtual APIC. To deliver interrupts to guest processors, KVM uses the virtual Advanced Programmable Interrupt Controller (APIC) architecture. Hardware support for posted interrupts in modern processors allows asynchronous injection of interrupt vectors into guest vCPUs without causing an expensive exit from virtualization mode (VM-exit), significantly reducing I/O latency.
Virtual Processor Scheduling. Each vCPU in KVM represents a standard Linux kernel thread. The host task scheduler manages these threads as ordinary processes. When the scheduler allocates a time slice to a vCPU thread, KVM executes the VMLAUNCH instruction to give it the physical core, ensuring strict isolation and fair resource distribution through cgroups.
Saving and Loading VM State. KVM supports an atomic mechanism for saving the guest execution context (including processor registers, caches, and device state) to a file or network stream. The live migration function uses this mechanism to copy dirty memory pages in the background, transferring the virtual machine state to another physical node without stopping its operation.
Peripheral Emulation and PCI Bus. Besides virtio, for compatibility with older operating systems, KVM through QEMU supports full software emulation of PCI/PCIe buses and classic controllers. This intercepts guest accesses to I/O ports and memory-mapped ranges, translating them into calls to the block layer or network stack of the host Linux.
Power Management and Timers. KVM provides guests with paravirtualized timers (kvmclock) that remain stable even when migrating between physical hosts with different clock frequencies. The system allows the use of MWAIT/HLT instructions for programmatic suspension of the vCPU, returning control to the scheduler and reducing the actual power consumption of the processor during guest idle times.
Memory Isolation and Security. The security of guest machines is ensured by the nested page table mechanism, creating an isolated address space. Technologies like Intel SMEP and SMAP restrict code execution and access in user mode inside the guest, preventing privilege escalation and guest-to-host attacks at the microarchitectural level.
Interrupt and Timer Virtualization. Modern kernel versions use the virtualized APIC timer (AVIC in AMD and APICv in Intel). They allow guest OSes to manipulate timer registers without a VM-Exit. This is critically important for high-frequency events, as handling every millisecond delay without hardware acceleration leads to performance degradation of real-time tasks inside the virtual machine.
Graphics and GPU Virtualization. KVM supports passthrough of physical graphics adapters via VFIO (Virtual Function I/O) and IOMMU technology. This allows the guest system to exclusively control the hardware GPU, sending rendering commands directly to video memory without emulator involvement, providing near-native performance in machine learning and 3D graphics tasks.
IOMMU (Isolation of direct memory access addresses)VFIO (Direct device Input-Output virtualization)
Bus and Storage Device Passthrough. Through the paravirtual driver vhost-scsi and virtual host bus adapter functions, KVM allows the guest to directly interact with SAN and NVMe devices. VFIO provides direct access at the interrupt and DMA level, reducing the I/O path to a minimum, eliminating the bottleneck of the software block device simulator.
Tracing and Debugging System. KVM is integrated with the kernel tracing mechanism ftrace, providing detailed logs of VM-Exits, hypercalls, and interrupt injection. The kvm_stat subsystem shows the frequency of exits from guest mode by reason (exit reasons) in real time, allowing engineers to pinpoint performance anomalies without stopping the workload.
Virtual File System and Guest Agents. The data exchange channel is implemented through virtio-serial. The host transfers files, executes commands, and synchronizes the clipboard using the QEMU Guest Agent (qemu-ga) running inside the guest. This interface allows the administrator to freeze the guest file systems before creating a consistent volume snapshot.
QEMU Guest Agent (Host-Guest communication service channel)
Scaling and NUMA Support. KVM is aware of the physical memory topology (Non-Uniform Memory Access). The hypervisor can present a virtual NUMA structure to the guest, mapping its vCPUs and local memory segments to specific sockets and cores of the physical server. This affinity binding prevents penalties for cross-socket data access inside a multiprocessor VM.
Memory Management and Deduplication. Kernel Same-page Merging (KSM) technology scans the host physical pages allocated to different virtual machines and merges pages identical in content into one marked as copy-on-write. This significantly saves RAM when running dozens of identical guest OSes, for example, in VDI environments.
KSM (Combining identical kernel memory pages)
Guest Memory Compression and Swap. Under resource shortage conditions, guest memory can be evicted to a swap file on the host. KVM interacts with VirtIO-Balloon, which allows the agent inside the guest to return unused pages back to the hypervisor. Memory compression (zswap) is applied transparently before actually offloading pages to disk, speeding up access to rarely used data.
Virtio-balloon (Redistribution of unused guest system memory)
Virtual Socket Interface. For high-speed interaction between guest and host (or between VMs on the same node), the AF_VSOCK socket family is provided. It uses shared memory and does not require IP network configuration. CID (Context Identifier) addresses a specific machine, providing a communication channel not subject to the latency and losses of a physical Ethernet network.
Binary Translation and Microcode. In situations where hardware virtualization is unavailable or insufficient for executing rare real-mode instructions, KVM resorts to emulation. The lightweight x86 emulator inside the kernel transforms problematic guest instructions into a safe and correct stream of micro-operations, ensuring full compatibility with non-PAE kernels and legacy software.

Comparisons

KVM vs Xen (Paravirtualization). Unlike Xen paravirtualization, which requires a modified guest OS, KVM uses hardware extensions (Intel VT-x/AMD-V) for full virtualization without patching the guest kernel. This ensures running unmodified systems (Windows, Linux) out of the box, simplifying deployment, while Xen PV offers lower overhead at the cost of universality in legacy configurations.
PV (Virtual machine I/O acceleration)
KVM vs QEMU (Pure Emulation). KVM acts as an accelerator for QEMU. While QEMU can emulate the processor entirely in software (tcg mode), working slowly on foreign architectures, KVM switches guest code execution directly to the processor supervisor ring. The QEMU/KVM combination gives near-native performance for guests on the host machine architecture, while retaining QEMU flexibility in emulating I/O devices.
KVM vs VMware ESXi (Architecture). ESXi uses a proprietary microkernel hypervisor (vmkernel) that exclusively manages the hardware, including device drivers originally written for Linux but adapted by the vendor. KVM, on the other hand, turns a standard Linux kernel into a hypervisor, inheriting all the power of its subsystems. Consequently, KVM supports the widest range of hardware automatically through the mainline kernel, while ESXi is strictly limited by the vendor hardware compatibility list.
KVM vs Docker/LXC Containers (Isolation). This is a comparison of a hypervisor and OS-level jail isolation. KVM runs separate kernels for each virtual machine, ensuring complete system call isolation and the ability to run any OS (Linux, Windows), which is critical for the security of a multi-tenant environment. Containers use a shared host kernel, offering near-zero CPU overhead and instant start, but are inferior in isolation strictness and cross-platform capability.
KVM vs Hyper-V (OS Integration). Both solutions are built on a microkernel principle but differ in ideology. Hyper-V is deeply integrated into Windows Server, allocating a parent management partition, and is natively optimized for the Microsoft ecosystem. KVM is integrated into the Linux kernel, turning the host into a lightweight kernel module. This gives KVM openness, no licensing fees, and native support for the evolving Linux toolset (Ceph, Open vSwitch) in cloud orchestrators.

OS and driver support

KVM supports a wide range of guest operating systems (Linux, Windows, BSD) through a combination of hardware emulation and paravirtual virtio drivers; for Windows, signed WHQL virtio-win driver packages are released, which implement direct access to block devices (viostor) and network card (netkvm) through shared ring buffers, as well as memory management (Balloon), bypassing slow IDE or e1000 emulation.

Security

Kernel-level isolation is implemented through SELinux (sVirt project), where each guest QEMU process receives a unique security label and a random MCS (Multi Category Security) category, forcing the kernel to block any attempts of one virtual machine instance to read image files (svirt_image_t) or affect the memory of another guest, even if an attacker has gained root access inside the VM.

Logging

The libvirt audit subsystem records life cycle events of virtual machines, sending structured messages to audit.log with uuid, vm-pid, and security context fields, and records of type VIRT_RESOURCE detail changes in resource configuration, for example, during hot disk add (resrc=disk) or memory size change (resrc=mem), allowing to trace the path of image migration or its cloning.

Limitations

Despite native integration with the kernel, KVM is limited by hardware requirements (mandatory presence of Intel VT-x or AMD-V) and architectural dependency: for example, nested virtualization on POWER systems requires complex emulation of the interrupt controller (XICS on top of XIVE) to prevent host crash, and also imposes specific restrictions on direct PCI device access when Credential Guard protection is enabled in guest Windows.

History and development

Development of the module was started by Avi Kivity at Qumranet in 2006 and accepted into the mainline Linux kernel 2.6.20 in February 2007, as the community preferred a lightweight solution built into the kernel over the external Xen hypervisor, after which the project was acquired by Red Hat and evolved into the foundation of OpenStack and system containers.