What is Memory Overcommit (Virtual memory allocation exceeding Physical memory)

Memory Overcommit is an operating system mechanism that allows allocating more total virtual memory to processes than the available physical RAM and swap space. The system assumes that processes will not use all requested resources simultaneously, optimizing hardware utilization.

This technique is widely used in virtualization environments and cloud platforms. Hypervisors such as KVM, VMware ESXi, and Xen actively use memory overcommitment to increase the density of virtual machines on a physical server. This allows running more guest systems with a total configured RAM exceeding the host’s hardware limit, which is economically beneficial for server consolidation.

KVM (Turns the Linux kernel into a hypervisor)

The main problem is performance degradation when there is a real shortage of resources. When the total active memory consumption by guest systems approaches the physical limit, the hypervisor forcibly reclaims pages, causing a sharp drop in performance or stopping critical services. A second risk is instability: aggressive switching to slow swap and memory recovery mechanisms can provoke deadlocks or mass process termination in Linux-like systems by the Out-Of-Memory Killer.

How It works

The principle is based on the gap between reserved and actually used memory. When memory is requested, the operating system only creates virtual address spaces in page tables without immediately allocating real physical frames. Actual allocation occurs on the first write to a cell via the page fault mechanism. As long as the total working set of all processes fits within the physical RAM limit, overcommit is unnoticeable. When a shortage occurs, reclamation algorithms are activated: guest ballooning drivers cooperatively take unused pages from the guest OS and return them to the hypervisor; Kernel Same-page Merging scans memory, finds identical pages and merges them into one physical page with a copy-on-write flag; transparent compression packs rarely used data into a small cache, deferring swap to disk. If pressure continues to rise, the system moves to harsher measures: in Linux, the OOM-Killer heuristic is triggered, selecting the least valuable victim process based on memory usage and priority, while VMware ESXi notifies the guest OS of the need to use its own swap. The key security parameter is vm.overcommit_ratio in Linux, which regulates the ratio of virtual memory to physical memory to prevent catastrophic failure under total parasitic consumption.

Memory Overcommit functionality

Management modes via vm.overcommit_memory. The Linux kernel provides three global policies controlled by the vm.overcommit_memory parameter. Value 0 enables a heuristic algorithm that rejects obviously excessive requests but allows reasonable overcommitment to reduce swap usage. Value 1 disables all checks, allowing the kernel to always satisfy a virtual memory allocation request regardless of physical resource availability. Value 2 implements strict accounting, prohibiting exceeding the sum of swap and a configurable percentage of physical RAM.
Default heuristic mode. Mode 0 applies a simplified estimation model, refusing memory allocation only when an absurdly large amount of address space is requested. It aims to prevent fatal crashes due to single defective allocations while allowing the system to benefit from the typical physical memory savings resulting from incomplete page usage by applications under normal operating conditions.
Always overcommit mode. Mode 1 completely disables virtual memory limits at the allocation stage. The kernel always returns success for malloc or mmap operations. This mode is intended for specific scientific applications manipulating huge sparse arrays where the logical size of the data structure far exceeds physically available memory, but only a tiny fraction of the address space is filled.
Strict accounting without overcommit mode. Policy 2 guarantees that the amount of committed memory (Committed_AS) will never exceed a hard limit (CommitLimit). The limit is calculated as the sum of available swap space and a fraction of physical memory. In this mode, a process will not be terminated by the OOM Killer when accessing already allocated pages but will receive an error at the memory allocation system call when the limit is exhausted.
Managing the limit under strict accounting. The amount of available memory when vm.overcommit_memory=2 is set via vm.overcommit_ratio as a percentage of RAM or via vm.overcommit_kbytes as an absolute value. The calculation formula is: CommitLimit = SwapFree + (RAM * overcommit_ratio / 100). With the standard value of 50 and zero swap, user processes have access to only half of the machine’s physical RAM.
Memory state monitoring. Information about the current accounting state is exported via the proc virtual file system: Committed_AS reflects the amount of memory the system has already promised to processes, and CommitLimit shows the current hard ceiling for such promises. Comparing these two values in /proc/meminfo allows the administrator to predict how close the system is to triggering OOM protection.
Private mapping accounting principles. The main contribution to Committed_AS growth comes from private and writable memory mappings. Each instance of such a mapping is accounted for by its full declared size because it may potentially require physical page allocation upon copy-on-write. Anonymous shared mappings also require full accounting since they have no file backing for data flushing.
Lightweight file mapping accounting. File-backed mappings are accounted for with zero cost if they are shared or read-only. The reason is that the backup storage for such pages is the file on disk rather than swap space. The kernel does not need to reserve additional slot pages because the data can always be reread from persistent storage.
Ignoring the MAP_NORESERVE flag. In strict accounting mode (vm.overcommit_memory=2), the MAP_NORESERVE flag passed to the mmap system call is completely ignored by the kernel. The logic is that no-overcommit mode must provide absolute guarantees of memory availability. Allowing applications to arbitrarily exclude their mappings from accounting would make it impossible to comply with the hard CommitLimit.
Stack growth pitfalls. Implicit stack expansion in the C language is implemented via the mremap mechanism and is not always predictable at startup. To obtain absolute guarantees when operating at the limit of strict accounting, one must manually reserve the maximum stack size via an explicit mmap call. Common stack usage practices do not cause problems but create risks in tightly constrained environments.
OOM Killer Mechanism. When the redistribution heuristic fails and physical memory is exhausted, the OOM Killer mechanism is activated. Its task is to free resources by forcibly terminating one or more processes. Victim selection is based on a badness score that reflects the proportion of memory used relative to the system’s available resources.
Calculating process badness score. The algorithm assigns each process a value from 0 to 1000, where 1000 indicates using 100% of the available memory pool. Processes that occupy a lot of physical memory and also spawn many child copies automatically receive greater weight. The kernel additionally gives a small 3% bonus to privileged root processes to protect them from immediate termination.
Manual adjustment via oom_score_adj. The /proc/[pid]/oom_score_adj interface allows userspace to directly influence the kill priority. The value ranges from -1000 to +1000 and is added to the kernel’s badness score. Setting the value to -1000 completely excludes the process from the OOM Killer’s target list because its final score will always be zero. A value of +1000 makes the process a primary victim.
Legacy oom_adj parameter. The historical /proc/[pid]/oom_adj interface with a range from -16 to +15 is preserved for backward compatibility. The special value -17 in older kernels completely disabled OOM Kill for a process. Writes to oom_adj and oom_score_adj are mutually converted by the kernel in a linear proportion; however, using the new 32-bit oom_score_adj is strongly recommended by the documentation.
Atomic accounting at fork stage. The fork system call triggers a check of the memory overcommit limit. The kernel checks whether creating a copy of the parent’s address space will remain within the allowed limit. If the child process could hypothetically require writing to all private pages and that would exceed the limit, the fork call fails, preventing cascading memory exhaustion.
Accounting for permission changes. Mprotect operations that change page flags cause an update to the global Committed_AS counter. Turning a non-writable memory segment into a private and writable one automatically increases the amount of accounted memory. Conversely, removing the write flag can reduce the load on the limit because the need to reserve pages for potential copy-on-write disappears.

Comparisons

Memory Overcommit vs Memory Ballooning. Memory overcommit is a hypervisor strategy that allows allocating more memory to virtual machines than is physically available, assuming incomplete utilization. Ballooning is a mechanism for its implementation: a driver inside the guest OS inflates, taking unused memory and returning it to the hypervisor for other needs. If overcommit describes the allocation policy, then ballooning is an active reclamation tool that activates during shortages.
Memory Overcommit vs Transparent Page Sharing (TPS). TPS implements page deduplication: the hypervisor scans memory and merges identical pages from different VMs into a single physical page, saving resources. Unlike competitive redistribution via ballooning, TPS eliminates data redundancy rather than fighting idleness. However, due to the risks of side-channel attacks on shared pages, this feature is often disabled by default.
Memory Overcommit vs Memory Overcommitment (AHV vs ESXi). In Nutanix AHV, the overcommit mechanism is implemented through policies and prioritized ballooning reclamation followed by swap to disk. In VMware ESXi, a layered strategy is applied: first TPS is activated, then ballooning, then memory compression, and only as a last resort swapping to SSD cache or hard disk. This cascading VMware model aims to minimize performance degradation.
Memory Overcommit vs Linux Overcommit Memory. In Linux, the vm.overcommit_memory parameter controls the memory allocation policy for processes, not virtual machines. Values 0, 1, or 2 determine whether the kernel will allow a process to request more address space than the sum of RAM and swap without actually having pages. Unlike hypervisor overcommit, which dynamically returns memory via ballooning, control here is based on OOM-Killer heuristics or static limits without cooperation with the application.
Memory Overcommit vs Memory Hot Plug. Overcommit changes the logical representation of memory without the guest OS’s knowledge, whereas hot plug adds or removes physical RAM modules on the fly. With hot plug, the OS sees the change in total RAM and can use new address space without limits. Overcommit manipulates memory within an assigned limit, creating an illusion of resources, but if calculations are incorrect, it causes swapping and sharp performance drops.

OS and driver support

The overcommit implementation in Linux is provided via the vm.overcommit_memory kernel parameter, which takes three values: 0 (heuristic allocation where serious requests are rejected but moderate overcommit is allowed), 1 (always allow overcommit to support scientific applications using sparse arrays), and 2 (hard limit: total requested memory does not exceed physical RAM plus swap with a configurable ratio). At the system call level, mmap and brk reserve virtual addresses without immediately allocating physical pages, using the MAP_NORESERVE flag to opt out of swap reservation, and demand paging is activated only on first memory access, implemented via the page fault handler that checks memory.max limits in cgroup v2 before allocating an actual page.

Security

From a security perspective, aggressive overcommit creates a risk of memory side-channel attacks, especially in virtualized environments where an attacker in a guest VM can use page deduplication to leak address space (ASLR) of a neighboring virtual machine. Vulnerability CVE-2015-2877 demonstrates how an attacker can measure copy-on-write latencies to detect shared pages and subsequently brute-force base addresses of libraries and executables. Protective implementation requires either completely disabling memory deduplication (KSM in KVM, TPS in VMware, Page Fusion in VirtualBox), or randomizing the content of static pages, increasing ASLR entropy, and using huge pages (2MB) to reduce attacker precision.

KSM (Combining identical kernel memory pages)

Logging

Monitoring and logging the consequences of memory overcommit is implemented through the cgroup v2 filesystem, specifically the memory.events file, which records counters of critical events: oom_kill — the number of processes killed by the OOM-killer, max — the number of attempts to exceed the memory limit followed by an OOM-killer invocation or allocation failure returning -ENOMEM, and low — the number of attempts to reclaim memory from a cgroup even when its consumption was below the memory.low boundary, indicating overcommit of memory protection. Additionally, in systems with DAMON, proactive page reclamation operations are logged based on access counters, allowing tracking of cold pages and recording the success of pageout schemes without activating the OOM-killer.

Limitations

The main architectural limitation is that even with OVERCOMMIT_NEVER, the kernel cannot guarantee the complete absence of the OOM-killer because physical memory allocation occurs asynchronously relative to virtual reservation, and the kernel’s own memory needs cannot be predicted at the virtual allocation stage. In environments with memory cgroups, exceeding the memory.max limit is not considered overcommit from the perspective of the global vm.overcommit_memory policy and still results in invoking the OOM-killer within the cgroup, and attempts to implement strict accounting based on mem_cgroup_get_max() encounter the absence of background reclaim for cgroups, causing false allocation failures even when reclaimable pages exist.

History and evolution

Historically, the overcommit mechanism has been present in Linux since version 1.0.0 (1991), where Linus Torvalds implemented demand loading and shared pages, allowing 30 instances of /bin/sh to run on a machine with 6MB of RAM, where without overcommit there would be insufficient memory. Strict overcommit modes were added by Alan Cox in February 2002, and in 2021-2022 development moved toward integration with cgroup v2 and proactive memory reclamation via DAMON/DAMOS: the damon_reclaim module allows evicting cold pages based on data access patterns without OOM-killer involvement, using time and volume quotas for processed memory. In parallel, attempts to incorporate memcg accounting into overcommit logic were pursued but rejected due to the fundamental asynchrony of memory allocation and deallocation processes.