DPDK (High speed shared memory access)

DPDK with vhost-user allows a virtual machine to directly access the network card, bypassing the slow software layers of the main host kernel. The guest system and the DPDK process on the host share a common memory area, exchanging packets almost instantly without extra copying or context switches.

The primary application area is telecommunications systems and cloud services, where network throughput is critically important. The technology forms the basis of virtualized network functions (VNF) in NFV architectures, such as virtual routers, firewalls, and load balancers. Telecom operators use it on platforms like Open vSwitch to pass traffic to guest instances when the standard paravirtual Virtio driver becomes a bottleneck, creating excessive latency.

Difficulties often arise when allocating large page memory blocks. If a sufficient amount of HugePages is not configured on the host, the virtual machine will not start, and the DPDK service will refuse to initialize. Multi-queue requires precise matching of the number of threads in the guest system and on the host; otherwise, some queues remain idle, causing uneven CPU core utilization and micro traffic bursts. A serious engineering mistake is a NUMA memory topology mismatch: when memory is allocated on a remote socket relative to the processor, latency increases unpredictably, negating the speedup from kernel bypass.

How DPDK works

The operating principle is based on direct packet transfer through shared memory without hardware emulation. Initially, the initializer (usually a DPDK application on the host) creates a Unix server socket. When the virtual machine starts with a vhost-user interface, QEMU connects to this socket as a client. An initial handshake occurs over the protocol, where the main goal is to agree on the guest memory area. The host receives the Virtio ring structure and the guest physical address translation table via the socket file descriptor.

The key mechanism is that the DPDK application, using a poll mode driver, maps the virtual machine memory directly into its own address space via mmap. To transmit a packet, the host application takes an empty buffer from a guest available ring descriptor, copies the payload data directly using the virtual address in the shared region, and moves the used ring index. The guest kernel or user process sees a new entry in the ring without a single interrupt. All interaction occurs through atomic operations on pointers in shared memory, eliminating hypervisor mode switches and completely removing the overhead of extra packet copies between guest and host address spaces.

DPDK functionality

  1. Virtual device initialization in the PMD. The poll mode driver binds to the shared memory domain created by QEMU via the vhost-user socket. At this stage, transmit and receive queues are allocated, and callback functions for the guest OS are assigned. The driver registers an interrupt handler, even though the main loop operates in poll mode without notifications.
  2. Feature negotiation. During the connection process, the Virtio frontend and vhost-user backend synchronize capabilities. Support for mergeable buffers, hardware checksum offloading, and multi-queue configurations is checked. If the guest driver declares compatibility with VIRTIO_F_VERSION_1, the modern mode with ring buffers using a separate block layout is activated.
  3. Managing ring structures in guest address space. Vhost-user translates QEMU guest virtual addresses to host system physical pointers via mmap. As a result, the DPDK application gains direct access to Virtio descriptor tables, available rings, and used rings for the transmitter and receiver. This mapping eliminates hypervisor-level copying, fully delegating queue management to user space.
  4. Processing memory requests via the VHOST_USER_SET_MEM_TABLE protocol. Immediately after the virtual machine starts, QEMU sends the guest memory layout to the backend via a Unix socket. The structure contains a list of virtual segments, offsets, and file descriptors. The application maps these regions into its own address space, creating direct visibility of guest buffers for subsequent accelerated processing of network packets.
  5. Configuring Virtio queues. The VHOST_USER_SET_VRING_NUM message tells the backend the queue size allocated by the frontend. This is followed by VHOST_USER_SET_VRING_ADDR with the descriptor translation table and flags. When the ring is activated by a call to VHOST_USER_SET_VRING_KICK, the application initializes observers for the eventfd file descriptor to record when the guest sends new buffers to the available ring.
  6. Kick mechanism and interrupt suppression. When the transmit queue is filled, the guest driver sends a signal via the kick fd. The DPDK application catches this event via epoll, initiating an immediate aggressive readout of descriptors. However, to reduce overhead, dynamic notification suppression is applied: if the backend is continuously processing the ring, it temporarily masks interrupts until the frontend forcibly restores them when descriptors are insufficient.
  7. Extracting Virtio Net headers. Each packet in the descriptor table is prefixed by a virtio_net_hdr structure. This area contains TSO segmentation flags, checksum pointers, and the number of merged buffers. The receive function necessarily inspects this field before passing it up the stack: it extracts the payload and signals the need for hardware offloading in mbuf metadata.
  8. Direct writing to guest receive buffers. After capturing a packet from the physical port, the PMD finds a free descriptor in the available receive ring. Using the guest page physical address and the offset from the descriptor, the driver copies data directly via memcpy into the guest buffer, bypassing intermediate hypervisor queues. After filling, the used ring index is updated via a memory store barrier.
  9. Software emulation of multi-queue offloads (TSO/LRO). If the physical network adapter does not support segmentation, the vhost driver performs software splitting of large segments. It sequentially fills a chain of guest descriptors, respecting the MSS limit, and correctly sets the VIRTIO_NET_HDR_F_NEEDS_CSUM flags. This software logic is implemented within the transmit function, preparing full frames before sending to the guest ring.
  10. Support for guest live migration. In vhost-user mode, the application must handle the VHOST_USER_GET_PROTOCOL_FEATURES message to synchronize the dirty memory page modification log. The protocol passes the address of a bitmap where the PMD must atomically mark guest pages after each write to the ring structures, ensuring consistent queue state transfer to the target node.
  11. Data integrity protection via barriers. Working with Virtio available and used rings requires strict adherence to a weak memory consistency model. The driver forcibly places a write memory barrier before the buffer release flag so that all data in the packet body becomes visible to the guest before the index is updated. Similarly, when reading incoming descriptors, a read barrier is required before accessing the content.
  12. Injecting interrupts via irqfd. To signal a filled receive ring, the backend writes a value of 1 to the eventfd descriptor associated with the guest MSI-X vector. Unlike the classic hypervisor model, DPDK does not enter kernel context; the write call to eventfd directly initiates a virtual interrupt injection through KVM, minimizing the guest application wake-up latency.
  13. KVM (Turns the Linux kernel into a hypervisor)
  14. Asynchronous operation with data copying (Async vhost). Modern DPDK builds include an asynchronous framework where the processor does not idle while waiting for DMA copying. Using crypto accelerators or Intel DSA engines, data from the physical port is moved to guest regions without central core involvement, generating a callback upon completion of the hardware copy transaction.
  15. Ring integrity monitoring and broken ring detection. When indices are incorrectly modified by the guest or due to insufficient space, the driver must set an error flag in the used ring. The PMD logic automatically isolates the queue upon detecting a fatal pointer mismatch, notifying the upper orchestrator via metrics. This prevents the entire processing fabric from hanging due to a failure in a single client driver.
  16. Multi-queue and RSS configuration in the guest environment. The host hardware classifier distributes flows across available vhost port queues. Using the VHOST_USER_SET_VRING_ENABLE protocol, DPDK activates the strictly necessary number of queue pairs. RSS hashing on the host side and queue binding to guest vCPUs via rte_flow enables symmetric traffic balancing without software packet reordering inside the virtual machine.
  17. MTU management and size validity checks. The driver writes the maximum frame length into the virtio_net_config structure when processing a configuration space request. When an attempt is made to transmit a packet exceeding the capabilities of the guest TAP driver, the vhost port drops the frame before placing it in the ring, saving resources on useless descriptor packing that the frontend would reject anyway upon MTU check.
  18. Isolation of control traffic via the built-in control queue. A special Virtio ring, dedicated to management commands, is processed asynchronously from the main traffic. It handles requests to update the MAC address table and commands to set VLAN filters. The vhost driver parses this stream outside the fast path, making changes to software forwarding tables without blocking data queues.
  19. Using external buffers. For scenarios where guest memory is not fully registered, a DMA import model for external buffers is used. The driver accepts mbufs with an attached physical address obtained from another process and binds them to the guest address space via a call to rte_vhost_va_from_guest_pa, providing zero-copy transfer for container images and unikernels.
  20. Backend reconnect control. This functionality demonstrates fault tolerance: if QEMU unexpectedly breaks the session, the DPDK application does not terminate but puts the port into a waiting state. Upon reconnection, the ring mapping is restored without completely reinitializing the port. Accumulated transmit data in buffers is preserved until the virtual machine resumes.
  21. Operation in packed ring mode. The modern Virtio 1.1 specification offers a compressed ring format where descriptors and availability flags are packed into a single contiguous memory area. The implementation of the vhost_enqueue_burst_packed function eliminates redundant memory accesses when processing packets in batches, since a single cache line is used for reading and writing status instead of three separate tables.
  22. Statistics export and deterministic monitoring. Counters of transmitted and received packets are aggregated into atomic variables at each queue level. The xstats function additionally exports global error codes for guest descriptors, ring miss counters, and the latency of the last kick signal, providing orchestration tools with data for making rebalancing decisions for virtual network functions.

Comparisons

  • DPDK vs AF_XDP. DPDK implements a complete kernel bypass, handing control of the network adapter directly to the user application, which provides maximum performance but loses compatibility with OS networking tools. AF_XDP, in turn, offers a balanced approach, relying on an early XDP hook in the driver to redirect packets to a shared ring buffer. This provides 70-80% of DPDK throughput while maintaining integration with the Linux network stack and operation in Kubernetes containers without the need for isolated cores.
  • DPDK vs io_uring. DPDK was created as a highly specialized solution for high-speed packet processing by directly accessing hardware while bypassing the kernel, but it captures the entire network adapter. The io_uring subsystem, originally designed for asynchronous storage I/O, provides an efficient interface for socket-based network operations, minimizing the number of system calls. However, in TCP traffic tests, io_uring unexpectedly shows throughput advantages in certain scenarios, leaving the DPDK backend behind without fully capturing the hardware.
  • DPDK vs KNI. KNI was developed as a bridge to send traffic from the DPDK fast path to the standard Linux kernel network stack, but its architecture with a dedicated kernel thread was never fully validated by the community. DPDK completely isolates traffic from the OS, working directly with the network adapter, which eliminates context switching overhead. The DPDK community considers KNI a deprecated mechanism and recommends virtio_user as a modern replacement for interacting with the kernel.
  • DPDK vs RDMA (RoCE). DPDK optimizes traditional Ethernet packet processing, providing libraries for building network functions such as firewalls and load balancers. RDMA over Converged Ethernet (RoCE) provides direct remote memory access, guaranteeing consistently low latency for packets of any size and reducing CPU load. Unlike DPDK, RDMA requires hardware acceleration on network cards and is more suitable for distributed computing and storage system tasks than packet routing.
  • DPDK vs XDP. XDP executes eBPF programs at the earliest stage of packet processing in the driver, allowing traffic to be dropped, modified, or redirected without copying to user space, which is ideal for simple functions like DDoS filtering. DPDK moves the entire packet into a user-space application for complex processing such as stateful load balancing with large connection tables. The instruction complexity limitations of eBPF make DPDK the irreplaceable choice for resource-intensive logic that requires deep analysis of session state.

OS and driver support

The DPDK core with vhost-user support operates in the user space of Linux and FreeBSD, requiring no specific kernel drivers, as the standard shared memory mechanism (hugepages) is used for data exchange with the virtual machine. The network card (NIC) must be managed by a compatible poll mode driver (PMD) that bypasses the OS kernel, taking packets directly via UIO (Userspace I/O) or VFIO (Virtual Function I/O); when using VFIO for direct hardware access from the guest OS, the vfio-pci kernel module may need to be loaded in noiommu mode if a hardware IOMMU for guest memory address translation is not used.

Security

vhost-user security is primarily ensured by strict control of access to guest memory through the RTE_VHOST_USER_IOMMU_SUPPORT option, which, when enabled during driver registration, forces DPDK to handle requests from the virtual device to access only those memory areas allowed by the guest virtual IOMMU, thus preventing unauthorized access from the virtual machine to host system memory. This is achieved by using the REPLY_ACK protocol in vhost-user messages for synchronization, but the ability to enable IOMMU may depend on the QEMU version, as older versions of the emulator have bugs in the implementation of this protocol.

Logging

Logging in the DPDK vhost-user library is flexibly configured in modern versions through a dynamic logging system, allowing the developer to set output levels separately for different modules, for example for the data path and control commands, and compile them into the binary by default, unlike the old approach where debug messages were often excluded at build time. To obtain a detailed vhost operation log at runtime, the standard EAL parameter --log-level=user1,debug can be used, as the vhost module uses the USER1 log type, allowing the administrator to redirect detailed messages to syslog or a file for subsequent analysis.

Limitations

The key limitation of vhost-user is the mandatory requirement for shared memory (share=on in QEMU) and the need for pre-allocated hugepages, without which DPDK cannot map guest memory into its address space. Furthermore, the technology is incompatible with some optimizations and functions without considering their specifics: for example, dequeue zero-copy for VM-to-NIC requires fine-tuning tx_free_threshold in the NIC PMD driver (e.g., i40e), otherwise descriptor pool exhaustion in the queue may occur, and using zero-copy together with vfio-pci in IOMMU mode is impossible without setting up guest memory DMA translations.

History and evolution

The integration of vhost-user into DPDK, which began as a response to the limitations of in-kernel vhost to achieve speeds close to hardware, has progressed from the first version in DPDK v16.07 to a full-fledged production library, which the Open vSwitch community and other projects like Tungsten Fabric subsequently replaced with an upstream version due to the obsolescence of their own implementations. Important development milestones include the addition of IOMMU support in version 17.11 to improve security, the introduction of dynamic logging in version 20.02 to improve debugging, as well as the implementation of packed ring layout and support for postcopy migration, which continuously expand the framework functionality and performance.