PVSCSI is a specialized driver for disk controllers in VMware environments, replacing real hardware emulation with a software interface. It allows the guest operating system to interact directly with the hypervisor, reducing computational overhead and increasing I/O throughput.
The driver is used in high-load virtual machines on the VMware vSphere platform where low latency and intensive data exchange are critical: database servers, file storage, and mail systems. It is installed in both Linux and Windows guest operating systems starting from virtual machine hardware version 7 and above, via VMware Tools or the open-source open-vm-tools package.
Typical problems
The main difficulties are associated with a sharp increase in the command queue under high load, leading to ring buffer overflow and I/O errors. On outdated driver versions, a significant performance drop is observed when virtual machine snapshots are enabled. Another problem is the incompatibility of older PVSCSI versions with disaster recovery features, requiring manual replacement of the controller with LSI Logic SAS before migration.
How PVSCSI works
The mechanism is based on a direct communication channel between the guest OS and the ESXi hypervisor without translating physical device commands. The virtual machine sees a logical controller with a PCI interface, but instead of a hardware level, data exchange occurs through a shared memory area called the paravirtual interface. The driver in the guest system forms read or write requests, placing command descriptors into a ring buffer — a data structure located in a memory area accessible to both the guest and the hypervisor. Each descriptor contains a pointer to a data fragment, block size, and operation code.
The hypervisor extracts commands directly from this buffer, processes them, and places responses back, then sends a virtual interrupt to notify the guest system of completion. This scheme is fundamentally different from LSI Logic or BusLogic SCSI adapter emulation, where each signal requires complex processing and virtual processor context switching. PVSCSI employs batch processing mechanisms: in a single scheduling cycle, the hypervisor can extract and execute multiple commands at once, minimizing the number of costly guest context exits.
A special role is played by the queue management system with dynamic balancing. The ring buffer is divided into several rings for different request types, and the Device Queue Depth parameter defines the maximum number of outstanding operations per target device. As load increases, the driver increases queue occupancy, trying to keep latency at an acceptable level, but excessive depth causes storage overload and increased guest filesystem response time. The driver also supports MSI-X interrupt mechanisms, allowing distribution of command completion flows across different virtual processors for symmetric multiprocessing, which eliminates a bottleneck on a single core during intensive I/O flow.
PVSCSI functionality
- Direct hypervisor interaction mechanism. The driver uses shared memory between the virtual machine and the VMkernel layer to transfer SCSI commands. Instead of intercepting and translating each port access, the guest OS places request descriptors into the ring buffer queue, and the hypervisor directly retrieves them for processing.
- Ring buffer architecture. The key component is the request queue implemented as a ring buffer in shared memory. The guest system increments the producer index after adding a request, and VMkernel updates the consumer index. This mechanism eliminates costly guest environment exit operations (VM exits) for each request.
- Batch command processing. PVSCSI allows the guest OS to place multiple command descriptors into the ring buffer before notifying the hypervisor. Thanks to this, a single guest context exit operation initiates processing of an entire batch of SCSI requests, which significantly reduces the total number of context switches during intensive I/O.
- Interrupt and notification subsystem. VMkernel notifies the guest OS of command completion via a paravirtualized interrupt mechanism. The driver configures interrupt masking to reduce the event flow under high load, using a technique of periodic polling of completion flags in the ring buffer instead of processing each individual signal.
- Interrupt request handling. The PVSCSI driver registers a standard interrupt handler that, upon activation, reads status fields from the request completion area. When using reduced interrupt mode, the handler scans the queue in a single pass, identifying all completed commands, and groups their transfer to the upper level of the block device stack.
- I/O manager integration. The driver integrates into the guest OS storage stack as a SCSI miniport, providing a standard interface for the I/O manager. This allows disk requests from filesystems to be passed directly to the paravirtualized adapter without involving emulation layers, supporting full compatibility with native operating system mechanisms.
- Tile request mechanism. To optimize work with large data blocks, the driver supports segmented scatter-gather list elements in an extended format. The descriptor format allows describing several physically non-contiguous memory fragments in a single ring buffer element, reducing the number of occupied slots and increasing throughput.
- SCSI management command processing. In addition to read and write, PVSCSI provides transparent pass-through of management commands such as
REPORT LUNS,INQUIRY, andMODE SENSEto the virtual SCSI target device in VMkernel. This allows the guest OS to correctly discover attached disks and dynamically track configuration changes without rebooting. - Direct memory access mode. Data exchange is performed exclusively through DMA transactions between guest physical memory and storage. The PVSCSI driver programs the controller, passing it a list of guest memory pages permitted for direct access. The hypervisor provides address translation through nested page tables, ensuring VM isolation integrity.
- Hardware queue support. The controller emulates the presence of a deep hardware queue of outstanding commands. Guest software can send hundreds of requests without waiting for previous ones to complete, and VMkernel internally reorders them for optimal dispatching to the physical array using its own schedulers.
- Hot add function. The PVSCSI adapter supports dynamic addition of new virtual disks without shutting down the virtual machine. Upon receiving a control signal from VMkernel over the paravirtualized channel, the driver initiates SCSI bus rescanning, sending
REPORT LUNS, which makes the new device immediately available to the operating system. - Power management and timeout policy. The driver implements power state management methods, allowing the hypervisor to transition an unused adapter into a reduced power consumption mode. Additionally, PVSCSI supports configurable periodic reset of hung commands by timeout with the ability to forcibly abort a request without destabilizing the entire SCSI bus.
- Compatibility with NVMe block profile. In modern VMware versions, block-level emulation of NVMe devices is implemented on top of the PVSCSI controller. This allows guest systems to use the optimized NVMe driver with its native multi-threaded I/O stack, while physical data transfer is still managed by the high-performance PVSCSI ring buffer.
- Guest-level queue management. The driver regulates queue depth by dynamically changing the limit of outstanding commands according to the
DeviceQueueDepthparameter. When the ring buffer fill threshold is reached, requests are temporarily held in the OS scheduler until slots are freed, preventing overflow and possible command loss. - Error and disconnect state handling. PVSCSI detects storage link disconnect states by the absence of responses from VMkernel within a specified interval. Upon timeout detection, the controller performs a soft bus reset, reinitializing the ring buffer and negotiation parameters without data loss in already cached guest filesystem volumes.
- Shared memory access control. Interaction security is based on access descriptors for shared memory pages allocated by VMkernel. The driver must explicitly request mapping of these pages into its address space. Any attempt of unauthorized access beyond allocated buffers is blocked by the hypervisor through memory protection mechanisms.
- Load balancing in multiprocessor systems. In SMP guest environments, the PVSCSI driver binds interrupt handling and command submission to different interrupt vectors using MSI-X. This allows distributing I/O operation processing across different cores, avoiding spin lock contention on a single queue and ensuring linear throughput scaling.
- Internal state synchronization. To protect ring buffers from races in a multiprocessor environment, lightweight atomic operations are used when updating producer indices. Only one kernel thread can advance the submission index at a given moment, and memory barrier controls guarantee visibility of written descriptors to VMkernel.
- Interaction with vStorage API framework. PVSCSI serves as the foundation for vStorage API for Array Integration. When hardware offload primitives are enabled, the driver accepts special lock identifiers and atomic operations from the guest, routing them through VMkernel to the physical array without conversion into a sequence of standard SCSI read-write commands.
- Adapter reset and recovery. Upon fatal errors, the driver initiates a full adapter reset procedure. During this, all occupied DMA buffers are freed, shared memory pages are remapped, initial queue index values are set, and interrupt reinitialization is performed, after which request processing resumes from a clean state.
Comparisons
- PVSCSI vs LSI Logic SAS. PVSCSI is a paravirtualized adapter developed specifically for VMware environments, while LSI Logic SAS emulates real physical hardware. PVSCSI works directly with the hypervisor through a high-speed ring buffer, bypassing the heavy SCSI emulation stack. This provides PVSCSI with significantly higher throughput and minimal latency, whereas LSI Logic SAS consumes more CPU resources for processing virtualized interrupts and I/O operations.
- PVSCSI vs vSCSI (Virtual SCSI). Unlike the universal vSCSI used for SCSI device pass-through, PVSCSI does not translate physical adapter commands but uses a lightweight exchange protocol between the guest OS and vSphere storage. This architectural difference makes vSCSI dependent on underlying hardware and introduces additional latency. PVSCSI aggressively optimizes the command queue, allowing more effective masking of array delays and achieving higher IOPS per virtual device.
- PVSCSI vs NVMe over Fabrics (NVMe-oF). PVSCSI is fundamentally tied to a SCSI command queue with a depth of up to 256 requests per device, whereas NVMe-oF uses multi-thousand queue pairs and remote direct memory access transfers. Despite SCSI protocol latencies, the PVSCSI driver demonstrates efficiency close to paravirtualized NVMe in synchronous workloads thanks to interrupt optimization, but loses in multithreading and scaling on modern flash arrays where parallelism is critical.
- PVSCSI vs vmxnet3 (network stack comparison). Although vmxnet3 is a network rather than a disk adapter, their architectures are similar: both use shared memory and mechanisms to avoid unnecessary data copies. PVSCSI, however, works with block data and implements complex path reservation failure handling logic (Pluggable Storage Architecture), absent in the vmxnet3 network stack, making PVSCSI more sensitive to driver versions and requiring strict compliance with Best Practices for timeouts and ring buffer size.
- VMXNET3 (Paravirtualized network adapter with hardware offloading)
- PVSCSI vs VirtIO SCSI. In the KVM ecosystem, the analog is VirtIO SCSI, which, like PVSCSI, is a paravirtualized HBA. The key difference lies in abort/reset command processing: PVSCSI manages them at the hypervisor level more aggressively to avoid LUN lockup during array failures. Additionally, paravirtualized SCSI command pass-through support in PVSCSI is historically more tightly integrated with the VMkernel I/O scheduler, giving an advantage in environments with active Storage DRS and frequent volume migration.
- KVM (Turns the Linux kernel into a hypervisor)
OS and driver support
The PVSCSI controller functions through a paravirtualized driver built into VMware Tools or the open-source open-vm-tools package. In Windows guest operating systems, the driver uses the Storport interface for the miniport, processing asynchronous SCSI requests through ring buffer mechanisms and direct access to hypervisor shared memory without physical HBA emulation. In Linux, the vmw_pvscsi kernel module registers in the SCSI mid-layer stack, activating tagged command queueing with a depth of up to 256 requests per LUN and dynamic MSI-X interrupts for load distribution across vCPUs. Guest OS support formally begins with Windows Server 2008 R2 and Linux kernel 2.6.32, but actual stable operation with automatic driver substitution via pvscsi.ko is present in all modern mainstream distributions and guest Windows starting from virtual hardware version vmx-07 and above.
Security
PVSCSI security is based on a direct paravirtualized I/O path that eliminates emulation of legacy hardware interfaces with historically vulnerable attack surfaces such as floppy controllers or EHCI. All command descriptors undergo strict validation in the hypervisor request ring through boundary checks of guest memory physical addresses and page access rights. As a result, the guest device cannot initiate a DMA attack on host memory or other VMs, as VMkernel rejects any descriptor referencing address space outside the segment allocated to the VM. Additional isolation is provided through IOMMU (Intel VT-d/AMD-Vi), blocking DMA transaction interception attempts. The integrity of critical data structures (initialization queue and driver context) is maintained by a privilege separation mechanism, where the guest driver only fills the ring with requests, while physical interaction with storage is performed exclusively in the privileged hypervisor world after successful verification of signatures and state hashes, preventing compromise even with full control over the guest OS.
Logging
Logging of PVSCSI operations on the ESXi side is implemented through the vSCSI component in VMkernel, which generates structured events with module code vmw_pvscsi or vscsi, recording adapter resets via HBA Reset command, command timeouts with tags, interrupt errors, ring buffer allocation failures, and abort states of individual SCSI requests with World ID and target LUN binding in /var/log/vmkernel.log. When extended Verbose level is enabled via vsish or esxcli system settings, the log additionally receives descriptor fill traces, interrupt acknowledgments, and full queue draining events before hot-remove. On the guest level, the Windows driver writes events to the system log via ETW tracing with the Microsoft-Windows-StorPort channel, logging PVSCSI-specific command codes (Query/Tur, Report LUNs) and critical configuration update failures in Event Viewer. In Linux, the kernel module exports debugging information through debugfs (/sys/kernel/debug/vmw_pvscsi), where interrupt statistics, reset counts, and queue states are visible. Error and reset logging is performed via the printk function with KERN_ERR level or dynamic debug, allowing the administrator to enable detailed output by changing the bitmask of the vmw_pvscsi.loglevel parameter.
Limitations
Technical limitations of PVSCSI include a maximum command queue depth per device of 256 requests for the standard ring and up to 1024 when using the second-generation paravirtualized queue (PVSCSIv2). The controller supports no more than 15 target devices per bus (Target ID from 1 to 15) and up to 255 LUNs per target, which limits the total number of addressable volumes in a configuration without dynamic expansion. The driver itself categorically does not support direct pass-through commands for low-level SCSI operations, such as manually sending an arbitrary CDB to the physical HBA, since the paravirtualized model converts guest requests into VMFS/VVol commands on the VMkernel side and does not pass through raw OP codes. Furthermore, the controller does not implement a full hardware context for cluster services with SCSI-3 Persistent Reservations in Virtual Compatibility Mode when using a physical RDMA bus. Boot support from PVSCSI is only possible on virtual hardware version 7 and above for BIOS and version 13 for UEFI with mandatory controller firmware included in the guest .vmx configuration firmware, limiting its use in old VMs migrated from vSphere older than 4.1.
History and development
The PVSCSI controller was introduced by VMware in vSphere 4.0 (2009) as an evolutionary shift from emulated LSI Logic SAS to a high-performance paravirtualized stack, initially implementing a single queue with a fixed depth of 64 requests and support exclusively for SCSI-2 reservation. In vSphere 5.1, the driver architecture was reworked with the introduction of a single ring barrier mechanism, which allowed scaling queue depth to 256 commands and reducing processor cache line synchronization overhead by avoiding frequent MMIO notifications. Subsequently, with the release of vSphere 6.7 and virtual hardware version 14, second-generation PVSCSI (vmw_pvscsi v2) support appeared, introducing separate rings for commands and completions, hardware support for large memory pages for descriptors (HugeTLB), and full integration with the Persistent Memory (PMem) subsystem for guest NVDIMMs, achieving over 3 million IOPS on a single virtual controller. Modern PVSCSI iterations in vSphere 8.x additionally implement native NUMA-aware balancing when placing ring buffers in memory of different sockets, Dynamic Queue Depth Scaling depending on backend storage latency, and support for guest NVMe-over-Fabrics by translating PVSCSI requests into NVMe/TCP sessions at the VMkernel kernel level while maintaining driver backward compatibility down to guest Windows Server 2008 R2 for legacy system migration.