What is Large Pages (Memory Page enlargement)

Large Pages (HugePages) is a memory management method where the operating system allocates blocks of 2 MB or 1 GB instead of the standard 4 KB. Page enlargement reduces the number of entries in the Translation Lookaside Buffer (TLB), which speeds up data access and lowers CPU overhead.

This technology is used in database management systems such as Oracle and PostgreSQL, where fast access to large data arrays is required. KVM and QEMU based virtualization uses huge pages to improve guest system performance. Scientific computing and machine learning tasks use them to hold multidimensional arrays. Java servers and in-memory caching systems also benefit from reduced TLB misses.

QEMU (Emulator and hardware virtualizer of a computer)KVM (Turns the Linux kernel into a hypervisor)

Typical problems

The main difficulty is memory fragmentation. The operating system struggles to find contiguous large blocks, especially after long periods of operation, leading to performance degradation. Allocating such pages requires explicit privileges and special tuning. Static reservation at system startup removes memory from the general pool. Some applications designed only for 4 KB blocks may become unstable when transparent huge page substitution is used.

How It works

Virtual memory is divided into standard blocks of 4096 bytes. The processor translates virtual addresses to physical ones using the associative translation buffer. The buffer has a limited capacity, typically covering only a few hundred entries. When an application intensively works with several gigabytes of memory, the number of required entries can reach millions, causing frequent misses and the need for lengthy page table walks in RAM. HugePages changes the translation architecture at the processor and kernel level. Instead of small blocks, pages of 2 megabytes or 1 gigabyte are used. One TLB entry now covers a memory area that is 512 or 262144 times larger than standard. When using 2 MB pages to address 4 GB of data, only 2048 entries are needed instead of a million. The processor almost always finds the ready address in the buffer cache.

Modes can be static, where the administrator reserves a pool at boot, or transparent, where the kernel itself attempts to defragment memory and merge small pages into large ones without changing application code. The second approach is more convenient but prone to delays during memory compaction and does not always guarantee block allocation.

Large Pages features

Pool reservation mechanism (Hugetlbfs). The first approach requires pre-reserving a fixed number of pages via sysfs or kernel boot parameters. The administrator writes the required number to /proc/sys/vm/nr_hugepages, after which the kernel allocates contiguous memory blocks that cannot be used for other system needs, creating a static pool.
Filesystem interface. Access to reserved pages is provided through a special pseudo-filesystem called hugetlbfs. The application creates files in the mount point and maps them into memory via mmap. This method requires superuser privileges for configuration and correct umask operation when creating object files for each data segment.
Anonymous mapping MAP_HUGETLB. The MAP_HUGETLB kernel flag bypasses the requirement to mount hugetlbfs. When used together with MAP_ANONYMOUS, a private mapping not tied to the filesystem is created. Memory is reserved directly from the pool, and when the process terminates or munmap is called, pages are automatically returned to the system without leaving leftover files.
libhugetlbfs programmatic interface. The library provides an abstraction layer for working with hugetlbfs, allowing applications to use huge pages for malloc() calls without modifying source code. Using LD_PRELOAD, the library intercepts standard memory allocation functions and places critical program segments (BSS, Data, Text) into large pages.
Static reservation problem. The drawback of hugetlbfs is the need to accurately calculate the pool size before launching applications. Unused reserved pages are excluded from the total available RAM, leading to inefficient memory usage. A configuration error can cause OOM (Out of Memory) during system boot if reservation is too aggressive.
Transparent Huge Pages (THP). Unlike a static pool, the THP mechanism works in the background, dynamically promoting 4 KB pages to 2 MB without user intervention. The kernel attempts to allocate a huge page on every page fault, and if impossible due to fragmentation, a transparent fallback to standard pages occurs without returning an error to the application.
Always allocation policy. When set to always, the kernel aggressively tries to satisfy any memory request with huge pages. This reduces TLB misses but can lead to memory overcommit if the application allocates a large virtual address space but uses only a small portion of it.
Memory Overcommit (Virtual memory allocation exceeding Physical memory)
Madvise allocation policy. This mode restricts THP usage to regions explicitly marked by the application via the madvise(MADV_HUGEPAGE) system call. This is the optimal choice for embedded systems and tasks requiring deterministic memory consumption, as it eliminates unpredictable physical memory consumption by the kernel.
Khugepaged daemon. A special kernel background process that scans virtual memory and asynchronously collapses sequences of aligned base pages into a single huge page. This allows on-the-fly memory defragmentation and improves efficiency even for applications that were started before THP activation or do not use explicit hints.
Memory defragmentation control. The defrag parameter in sysfs determines the memory compaction strategy when a huge page cannot be allocated immediately. Defer mode delegates defragmentation to background processes kswapd and kcompactd, eliminating delays in the application critical path. Always mode invokes synchronous garbage collection, which can add noticeable latency but guarantees allocation.
Impact on TLB and virtualization. The key performance factor is reducing Translation Lookaside Buffer misses. One TLB entry for a huge page covers a memory area that would require 512 entries in normal mode. In virtualization environments with nested page tables (EPT/NPT), the effect is doubled because address translation for both guest and host requires fewer walk levels.
NPT (Second-level address translation for virtualization)EPT (Hardware second-level memory address translation)
Max_ptes_none parameter. Controls the threshold of empty PTEs within the scanned area at which khugepaged decides to collapse pages. A high value allows even sparsely filled regions to be merged, increasing huge page coverage at the cost of higher physical memory consumption. A low value requires nearly full occupancy of the 2 MB slot for merging.
Huge Zero Page usage. To optimize reading of large uninitialized anonymous memory areas, the kernel can substitute a huge zero page. When reading, instead of allocating physical memory for each 4K page, a single read-only huge page filled with zeros is used. On first write, copy-on-write (COW) occurs with allocation of a real page.
tmpfs/shmem support. Transparent Huge Pages can be used for in-memory filesystems. The huge=always mount option enables huge page allocation attempts for tmpfs, which is critically important for inter-process communication (shmem) and shared anonymous mappings used in graphics drivers and Android Ashmem.
Reducing kernel overhead. When using 2 MB pages instead of 4 KB, the page fault frequency for sequential memory access decreases by a factor of 512. This dramatically reduces context switches between user mode and kernel, as well as overhead from clear-page and copy-page operations in exception handlers.
Two-phase allocation strategy. When reserving a huge page pool at boot time, if the requested size exceeds 90% of RAM, the kernel splits the allocation into two phases. This prevents peak memory consumption that could be exacerbated by batch freeing of vmemmap pages and avoids OOM Killer activation during system startup on large servers.
Fragmentation and deferred compaction problem. Even with sufficient free memory, a huge page may fail to allocate due to fragmentation. External fragmentation requires compaction, an expensive operation of migrating occupied pages. Asynchronous operation of kcompactd minimizes latency but creates a window during which performance may be unstable.
Disabling THP for databases. OLTP workloads (MySQL, PostgreSQL, MongoDB) often show performance degradation or sharp latency spikes when THP is enabled. Aggressive page collapsing by khugepaged and synchronous defragmentation can conflict with internal caches and memory management policies of DBMS. It is recommended to forcefully set transparent_hugepage=never in bootloader parameters.
Monitoring khugepaged activity. For performance debugging, metrics pages_collapsed (number of successfully merged pages) and full_scans (number of full scanner passes) are exported in sysfs. The scan_sleep_millisecs setting allows adjusting the scan frequency. Setting it to 0 forces the daemon to run at maximum load, useful for benchmarking but dangerous in production.
Forced mapping of ELF segments. The hugeedit utility from the libhugetlbfs package allows embedding flags into a binary ELF file, instructing the loader to place code and data sections exclusively in huge pages. Together with hugectl, this lets administrators manage huge page usage strategy without recompiling code, specifying binary masks to select the appropriate page size.
Fine-tuning max_ptes_swap parameter. This parameter limits the proportion of page table entries pointing to the swap partition; if exceeded, khugepaged refuses to collapse. A high value forces merging even when active page swapping from disk is required, potentially generating I/O spikes. A zero value disables swapping for candidates being merged.
Post-process exit behavior. Files created in hugetlbfs continue holding huge pages even after the creating process terminates, until the file is explicitly deleted. This can lead to system-level memory leaks. Using MAP_HUGETLB | MAP_ANONYMOUS avoids this drawback, as resources are automatically freed when the process address space is destroyed.

Comparisons

HugePages vs Transparent HugePages (THP). Standard HugePages require manual allocation and configuration via hugetlbfs, guaranteeing the application 100% availability of large pages. Transparent HugePages work automatically, invisibly replacing 4K pages with 2M pages where possible. The price of THP convenience is latency: the kernel may start defragmenting memory to assemble a large block at an inopportune moment, which is critical for databases.
HugePages vs mmap with MAP_HUGETLB. HugePages reserve a physical memory pool via the kernel or hugetlbfs, providing a guaranteed amount. The MAP_HUGETLB flag allows applications to request pages from this pool directly via anonymous mapping without mounting a filesystem. However, if the pool was not pre-filled by the administrator, mmap will fail, so hugetlbfs remains a more convenient interface for accessing reserved huge pages.
HugePages vs Shared Memory (shmhuge). Regular shared memory (shm) allocates buffers using standard 4K pages, leading to overhead from repeatedly updating process page tables. The shmhuge mechanism places segments in huge pages, reducing TLB misses when accessing shared data from different processes, but requires manual pool reservation. This is critical for high-load systems where multiple processes actively read the same large I/O buffer.
HugePages vs Large Pages in Windows. Both Linux and Windows share the same goal: increasing the base memory block to offload the TLB. The difference lies in management: Linux requires manual reservation through pools or transparent allocation, whereas Windows, from certain versions onward, relies more on Large Pages specifically in the context of server applications like SQL Server, which request them directly via API, bypassing filesystem interfaces.

OS and driver support

Implementing large page support requires deep integration at the kernel and device driver level. In Linux, the kernel provides two main mechanisms: static HugePages, pre-reserved by the administrator via sysctl or bootloader command line, and Transparent Huge Pages (THP), managed automatically by the khugepaged kernel background thread that scans memory and merges 4K pages into 2MB blocks. Virtualization drivers, such as mshv for Microsoft Hyper-V, implement software emulation of giant pages (1GB): since a direct hypercall for mapping 1GB is absent, the driver breaks such blocks into a sequence of aligned 2MB chunks, forcing the hypervisor to merge them into a single TLB entry on the host side. In Windows, support is activated via the SeLockMemoryPrivilege, and the GetLargePageMinimum function returns the minimum available size, which depends on the processor architecture (2MB for x64, 4MB for x86 without PAE).

Security

Using large pages is inherently tied to a trade-off between performance and the attack surface for side-channel attacks. On one hand, research shows that 2MB pages can reduce CPU cycles by approximately 11% for cryptographic workloads without significantly increasing data leakage through the PMU or TLB collisions, as page identification accuracy by an attacker remains at the level of random guessing. On the other hand, the mechanism of locking pages in physical memory creates a resource exhaustion risk, since such pages are never swapped out, and a malicious or faulty application that requests an excessive amount of HugePages can cause denial of service (OOM) for the rest of the system. Security implementation includes mandatory privilege checking (CAP_IPC_LOCK in Linux or SeLockMemoryPrivilege in Windows) before allocation, and in WINE/Proton, parameter validation has been added: STATUS_INVALID_PARAMETER is returned if the requested section size is not a multiple of the minimum supported page.

Logging

Monitoring large page status is implemented through several levels of filesystems and counters exported by the kernel. Transparent huge page statistics are available in /proc/vmstat: the thp_fault_alloc counter increments on successful page allocation during page fault handling, thp_collapse_alloc records successful merging of scattered pages into a block by the khugepaged background daemon, and thp_split registers each split back to base 4K pages (often due to memory pressure or reclamation). For detailed analysis of consumption by specific processes, /proc/PID/smaps files are used, where the AnonHugePages fields are summed; however, reading these files is resource-intensive and creates significant overhead when polled frequently. In cloud environments like AWS Aurora, metrics such as os.memory.hugePagesFree and hugePagesTotal are directly exported to monitoring services (Performance Insights), allowing administrators to see one-minute snapshots of reserved page usage.

Limitations

A fundamental limitation is the requirement for physical memory contiguity, which leads to fragmentation: even with sufficient free RAM, the system may fail to allocate a 2MB or 1GB block if free 4K pages are scattered randomly. To combat this, the compaction mechanism in Linux forcibly moves pages, but the compact_stall counter reflects delays of processes stopped for defragmentation, while compact_fail signals a failed attempt, leading to performance degradation instead of expected gains. There are also strict isolation barriers in container orchestrators: Kubernetes requires that HugePages requests and limits match, and the hugepages-2Mi resource does not support overcommit, and an application using shmget() with the SHM_HUGETLB flag must run with the correct supplementary group specified in hugetlb_shm_group.

History and evolution

The evolution of large pages began as a specific hack for databases on x86 systems to reduce TLB misses, when administrators manually reserved buffers through the legacy Bigpages mechanism, later replaced by hugetlbfs in kernel 2.6. A key architectural shift was the introduction of Transparent Huge Pages (THP), allowing unprivileged applications to automatically obtain large pages without code modification; however, due to defragmentation issues, many high-load services (Redis, MongoDB) recommend disabling THP and using only statically pre-allocated pages. Modern development is moving toward support for multiple page sizes: in Linux patches for Hyper-V, emulation of 1GB via 2MB-aligned chunks has been implemented, and the AArch64 architecture adds support for 64KB pages at the processor level without requiring translation.