Ballooning (virtio-balloon) is a virtualization driver that allows the hypervisor to take away part of a guest machine’s RAM when it is idle and return it when the load increases. This helps run more virtual servers on a single physical machine without hard fixed limits.
This technology is actively used in dynamic resource consolidation environments such as cloud platforms and container orchestration systems. Administrators use it for memory overcommitment, where the total amount of RAM allocated to virtual machines exceeds the host’s physical memory. This is useful in overprovisioning scenarios and when migrating virtual machines to free memory without stopping them.
The main challenge involves latency in memory allocation inside the guest during sudden spikes in consumption, which leads to swap usage and performance degradation. In real-time systems, ballooning can cause unpredictable latency spikes. A secondary issue arises when ballooning works alongside guest memory management systems that compress pages or move them to swap, conflicting with the driver and causing instability.
How it works
The principle is based on interaction between the guest OS driver and the virtio-balloon device in the QEMU emulator. The guest kernel allocates physical memory pages and passes their addresses to the driver, which tells the hypervisor it is ready to give up those pages via a shared virtio queue. The hypervisor removes those pages from the VM’s working set using the madvise system call with the MADV_DONTNEED flag, returning memory to the host OS. The reverse process begins when the guest needs more resources or the host commands it: the driver receives an instruction to return pages through the same queue, the hypervisor allocates new physical memory, and the guest starts using it, possibly with a delay until pages are initialized. The mechanism relies on free memory statistics inside the guest and does not require modifications to applications running in the virtual environment.
Virtio-balloon features
- Page transfer to host. The guest driver allocates memory pages with a flag that prevents recursive OOM calls and reduces the total RAM counter. Pointers to these pages are placed into an array, which is then sent through the
inflate_vqqueue. - Page release to guest. When the host requests deflation, the driver retrieves pages from its internal list, removing them from accounting. First, an array of page numbers is sent to the host via the
deflate_vqqueue, and only then the pages are freed back to the guest system’s allocator. - Main flow control loop. A kernel thread inside the guest waits for events via the
config_changequeue. When it receives a signal, it calculates the difference between the host’s target value and the current balloon size. Depending on the sign of this difference, it calls the inflate or deflate function. - On-demand statistics handling. The
stats_vqqueue works in reverse mode: the host sends a request, the driver receives it and sets a flag indicating that statistics need updating. The main thread wakes up, collects statistics viaupdate_balloon_stats, and sends the structure back to the host. - Guest memory statistics collection. The
update_balloon_statsfunction aggregates key metrics: swap in/out volume, number of major and minor page faults, free and total memory. Values are converted to bytes and stored in an array readable by the host. - Forced notification to host. The
VIRTIO_BALLOON_F_MUST_TELL_HOSTflag requires the guest to send the numbers of pages being freed before they are actually returned to the system. This ensures the host does not try to use memory that the guest has not yet returned, preventing data corruption. - Automatic deflation on OOM. When the
VIRTIO_BALLOON_F_DEFLATE_ON_OOMflag is set, the driver registers a handler with the kernel’s OOM subsystem. When the guest runs out of memory, the handler forcibly returns some pages from the balloon to prevent processes from being killed. - Free page reporting. The
VIRTIO_BALLOON_F_PAGE_REPORTINGflag enables asynchronous reporting of unused guest pages to the host. The guest kernel scans free lists and notifies the hypervisor, which can then applyMADV_DONTNEEDto reclaim them without active deflation. - Configuration space format. Interaction is built around the
virtio_balloon_configstructure. It contains two key fields:num_pages(target number of pages set by the host) andactual(current number of pages held by the driver). Exchange occurs via virtio mechanisms. - Internal balloon representation. The
virtio_balloonstructure holds pointers to three virtqueues: inflate, deflate, and statistics. It tracks the current page countnum_pages, a linked list of captured pages, and a page frame number (pfn) buffer of up to 256 elements. - Page frame conversion. The static function
page_to_balloon_pfnconverts a kernel page address into a frame number understood by the hypervisor. It shifts the value to align the guest OS page size with the standard balloon page size, typically 4 KB. - Batch size limits. When inflating or deflating the balloon, processing uses fixed-length pfn arrays. The maximum number of pages transferred in a single
tell_hostcall is limited to 256 elements, which allows efficient packing of requests to the hypervisor. - Synchronous operation acknowledgment. After sending a frame array via virtqueue, the driver waits for completion. The host must acknowledge receipt of the buffer, after which the guest side can continue modifying its page list.
- Recursive eviction problem. Using ballooning together with a shrinker can cause a busy no-op loop: the guest allocates a page for the balloon, creating memory pressure. This activates the shrinker, which takes the page back, reducing inflate progress to zero.
- OOM handler priority. To prevent infinite memory allocation loops, the deflation handler is given high priority. When the OOM killer triggers, the system first tries to free memory from the balloon, and only if memory is still insufficient does it kill processes.
- Page Reporting vs Hinting. Hinting is mainly used during live migration to speed up memory transfer. Reporting operates during normal VM operation, allowing the hypervisor to aggressively reclaim unused memory without stopping the guest.
- Monitoring via sysfs. Driver statistics, including inflate and deflate counters, are exported to
/proc/vmstat. This allows system administrators to monitor balloon activity without access to the hypervisor by analyzing parameters such asballoon_inflateandballoon_deflate.
Comparisons
- virtio-balloon vs DIMM-based hotplug. virtio-balloon does not change the guest’s hardware topology, adding memory transparently to the OS, whereas the DIMM approach emulates physical installation of memory modules requiring ACPI support. This makes ballooning more flexible for frequent changes but prevents the guest OS from using added memory in specific NUMA nodes, unlike hot-adding real DIMM devices.
- virtio-balloon vs virtio-mem. The main advantage of virtio-mem over ballooning is deterministic management: ballooning relies on guest cooperation, which may not return memory in time, while virtio-mem operates as a virtual device that guarantees allocation of memory blocks of a specified size (e.g., 4 MiB) and interacts correctly with the NUMA subsystem, which is critical for performance.
- virtio-balloon vs KSM. Unlike KSM, which works as a background process for deduplicating identical pages on the host without guest awareness, virtio-balloon requires active participation of the guest driver to explicitly inflate or deflate the balloon. This makes ballooning controllable and predictable in terms of how much memory is freed, but less transparent to applications inside the VM than KSM’s unobtrusive operation.
- KSM (Combining identical kernel memory pages)
- virtio-balloon vs memory cgroup. Memory cgroup limits memory consumption by a group of processes inside the guest at the kernel level, while virtio-balloon changes the total memory available to the entire VM by handing it back to the hypervisor. Thus, cgroups solve resource prioritization within a single system, while ballooning addresses global memory overcommitment on the physical host by returning unused pages to the hypervisor level.
- virtio-balloon vs guest swap. Compressing memory or swapping to a disk partition inside the guest allows the OS to handle temporary memory shortages using its own resources, whereas ballooning forcibly reclaims pages, forcing the guest to use internal mechanisms more aggressively, such as swap or the OOM killer. The key difference is that ballooning acts under external control from the host, while swapping is fully controlled by the guest system.
OS and driver support
Virtio-balloon is implemented according to the Virtio specification, requiring a guest driver that interacts with the host-side device (QEMU) via the kernel API. Although the core code is cross-platform, the driver uses OS-specific counters to obtain advanced memory statistics, returning, for example, only SystemCache for Windows or zeros for Linux parameters that are not meaningful, such as buffers.
Security
The basic model assumes full trust in the guest, as the host cannot hardware-force the guest not to use pages previously given to the balloon. However, the deflate_on_oom function acts as a safeguard: when the guest runs out of memory, the driver forcibly deflates the balloon regardless of host requests to avoid process crashes due to artificial memory starvation.
Logging
For monitoring, the host periodically polls guest statistics through a separate stats queue. If the guest has not yet updated the data, QEMU does not generate an error but sets the last-update field to 0 and returns -1 for all numeric metrics, allowing libraries such as libvirt to ignore empty responses without parsing error text.
Limitations
The device operates at fixed page size (typically 4 KB) without support for huge pages, causing fragmentation of guest physical memory. It also lacks NUMA awareness entirely. Furthermore, the architecture does not support hardware-level memory revocation from the guest, making the mechanism unsuitable for direct device passthrough (VFIO/PCI passthrough).
History and evolution
First appearing in 2008 as a means of redistributing memory by forcibly allocating (inflating) and releasing (deflating) pages inside the guest, by the 2020s the technology evolved from simple cooperative management to advanced asynchronous techniques such as free page reporting, allowing the host to reclaim only unused pages without pressuring the guest OOM, and significantly accelerating live migration.