What is HBM (3D stacked memory with silicon vias)

HBM is a stack of several DRAM chips combined into a single microassembly. Instead of placing chips side by side on a circuit board, they are stacked on top of each other and connected by vertical conductors passing through the silicon. This arrangement allows data to be transferred over an extremely wide bus, achieving enormous bandwidth with significantly lower power consumption compared to traditional solutions.

DRAM (Storage and Byte-addressing of Data)

The main application areas are graphics accelerators and specialized computing units for data centers, including AI accelerators and high-performance computing systems. This memory is also used in high-end network switches and some FPGAs where minimal latency in processing continuous data streams is required and the speed of communication between logic and memory is critical.

The complexity of manufacturing through-silicon vias and microassemblies results in high cost and limited supply. Heat dissipation from the neighboring processor causes uneven heating of the vertical stack, requiring complex cooling systems and temperature profile management. Furthermore, tight integration eliminates upgrade possibilities: the memory controller embedded in the silicon interposer substrate is rigidly tied to a specific generation and stack capacity.

How HBM works

The operating principle is based on combining several DRAM layers using Through-Silicon Via technology, microscopic vertical copper channels that pierce each chip. These chips are mounted on a base logic die that functions as a controller and buffer. The key difference from traditional memory lies in the interface organization: data exchange occurs not over a 64-bit bus but over a 1024-bit bus divided into eight independent 128-bit channels. Each channel operates at a relatively low clock frequency, but parallelism achieves enormous aggregate bandwidth. The base die receives commands via a high-speed serial interface from the main processor and then translates them into parallel write or read operations for the corresponding memory banks inside the stack. The physical proximity of the chips reduces connection lengths to fractions of a millimeter, minimizing latency and parasitic line capacitance, allowing strictly regulated voltage levels to be supplied. As a result, HBM transmits data with a wide front, like replacing a single-lane road with a thousand-lane highway where information flows synchronously and without congestion.

HBM functionality

Multilayer memory organization. HBM implements three-dimensional heterogeneous integration through vertical stacking of DRAM chips on a base logic switch chip. This architecture radically reduces physical interconnect length compared to the planar layout of DIMM modules placed on a motherboard.
TSV-based physical interface. Vertical connections between DRAM layers in the stack are made through Through-Silicon Vias. Thousands of copper-filled microvias pierce the chips through their thickness, forming an internal bus. Replacing wire bonding with TSVs reduces parasitic inductance and capacitance, minimizing signal delay.
Ultra-wide I/O bus. Unlike the narrow 64-bit bus of standard DRAM modules, HBM uses an interface 1024 bits wide per stack. This is achieved not by increasing clock frequency but by extensively increasing the number of lines. Bandwidth is defined as the product of effective frequency and channel width, shifting the architectural emphasis toward parallelism.
Microbump contacts. The DRAM chip stack connects to the silicon interposer via an array of microbumps. The interposer, manufactured using mature CMOS technology, contains several metal layers that route signals from the HBM stack to the computing chip (GPU or ASIC) with minimal insertion loss.
Silicon interposer as a switching medium. The base interposer die acts as a bridge. It receives signals from the stack physical layer (PHY) and translates them into high-speed serial channels or parallel buses to the SoC. Traces on the interposer allow HBM to be placed very close to the processor within a single 2.5D assembly.
Independent pseudo-channels. Each 128-bit HBM channel is further split into two 64-bit pseudo-channels. They share command and address lines but operate with separate bank queues and data. This semi-independence allows efficient handling of interleaved requests, hiding row activation latency through pipelining at the memory controller level.
Burst length BL4 and pseudo-channel mode. The minimum data burst length is four transfers (Burst Length 4). Combined with pseudo-channel splitting, this yields a 32-byte transaction per pseudo-channel. The BL2 pseudo-channel mode, if supported, reduces access granularity, optimizing power consumption for sparse data requests.
Distributed memory bank system. In HBM2e/HBM3 stacks, the number of banks per channel can reach 32. The memory controller can keep multiple rows open simultaneously in different banks of independent pseudo-channels. Increased bank-level parallelism improves the chance of a page hit, reducing average access latency.
Command addressing and inter-layer synchronization. Command packets are transmitted over differential lines with single-clock strobing. Row and column address bits are time-multiplexed. To eliminate signal skew between different physical layers of the stack, interface training is performed during initialization with per-bit deskew.
Power saving modes. The HBM specification includes low-power states: Standby, Self-Refresh, and deep sleep. When there are no accesses, the controller disables input buffers and data line terminators, putting the interface into standby mode. Self-sustaining data refresh in Self-Refresh mode preserves memory integrity during system idle periods.
Temperature monitoring and throttling. Built-in thermal diodes on each DRAM chip continuously measure temperature. When a specified thermal threshold is reached, the memory controller can dynamically reduce transaction frequency (throttling) or initiate forced refresh. Thermal management is critical due to the high packing density of the stack and proximity to a hot processor.
ECC and RAS support. Modern HBM generations integrate on-die logic for error checking and correction of single-bit errors. Hardware ECC operates transparently to the host, correcting single-bit failures and detecting multi-bit errors. The RAS architecture is supplemented by the ability to read error syndromes for predictive reliability analytics in data centers.
ECC (Memory Error Detection and Correction)
DFI interface for the memory controller. Communication between the memory controller and the HBM PHY is standardized via the DDR PHY Interface. A DFI-compliant controller translates bus requests into sequences of activation, read, write, and refresh commands, precisely observing DRAM bank timing parameters (tRAS, tRCD, tRP).
Direct access and coherence. The HBM protocol is not inherently coherent in the context of heterogeneous computing. When used with accelerators that have their own caches, coherence management is delegated to bus protocols such as CXL, which are layered over the physical memory pool and provide shared address space semantics.
Interposer routing and signal integrity. High-speed lines on the interposer require careful impedance matching. Differential signaling with on-die termination is used to suppress reflections. Power delivery network integrity is critical: decoupling capacitors are placed directly on the substrate to compensate for current surges during switching.
Pseudo-channel address interleaving. The memory controller applies physical address hashing to distribute traffic evenly across pseudo-channels. The algorithm seeks to avoid conflicts during sequential access by spreading adjacent row addresses across different banks and layers, maximizing channel utilization and minimizing command collisions.
Fault tolerance and line repair. Defective cells and TSV lines are identified during testing. Built-in self-repair logic replaces them with redundant elements by programming one-time programmable fuses or configuration registers, ensuring acceptable stack yields despite manufacturing defects.
Dual-mode data access. The controller may optionally support pseudo-channel locking mode, temporarily consolidating bandwidth for large atomic transactions. This reduces data fragmentation and is useful for streaming direct memory access operations where continuous byte flow matters more than parallelism of small requests.
Refresh specifics in 3D stacks. All DRAM layers in the stack must be refreshed synchronously. The controller issues a broadcast refresh command that activates on all levels simultaneously. Refresh latency (tRFC) determines how long banks are unavailable, and distributed command execution schemes can mask this pause under other operations.
Integration with modern protocols. Higher-level transport layers like UCIe (Universal Chiplet Interconnect Express) are layered on top of the HBM physical base. HBM3 allows advanced clocking schemes with independent strobing for sub-channels, adapting the stack for use as a high-speed shared buffer in multi-chiplet assemblies.
Energy efficiency per bit transferred. The key metric for HBM is picojoules per transferred bit. Thanks to small signal amplitudes (low-voltage differential logic with supply voltage below 1.2 V) and short physical traces on the interposer, dynamic power consumption is radically reduced compared to traditional DRAM mounted on a printed circuit board.

Comparisons

HBM vs GDDR. The key difference lies in connection topology. HBM uses a silicon interposer and a 1024-bit bus, providing radically low energy consumption per transferred bit. GDDR evolves within discrete chips on a printed circuit board, making it much cheaper to manufacture but creating physical limitations on bus width and requiring much greater power consumption to achieve comparable bandwidth.
HBM vs DDR5. System DDR5 memory is optimized for low random access latency and channel-mode operation with a central processor. HBM, in contrast, sacrifices latency for explosive bandwidth, targeting vector computations of graphics processors. DDR5 is flexible and expandable via slots, while HBM is statically integrated into the same package as the chip to minimize physical distance.
DDR5 (High-speed energy-efficient computer RAM)
HBM vs Wide I/O. Both standards use TSV connections, but their target segments are diametrically opposite. Wide I/O is optimized for ultra-low power consumption in mobile SoCs, sacrificing bandwidth for the smartphone thermal budget. HBM focuses on maximum data center performance, allowing multi-story stacking of DRAM chips, creating much higher traffic density than required for mobile devices.
HBM vs GDDR6X (PAM4). GDDR6X uses multilevel PAM4 modulation to double signal speed without increasing frequency, while HBM retains binary NRZ, relying on extreme parallelism. The paradox is that GDDR6X PAM4 encoding schemes are more complex and generate significantly more heat, while cool HBM achieves higher aggregate bandwidth through architectural width rather than signal encoding complexity.
GDDR6X (PAM4 encoding with multi-level amplitude modulation)
HBM vs CXL Memory. HBM is an ultra-close buffer memory, while CXL-based solutions deploy bus coherence semantics and memory itself over a PCIe interface on a physically separate device. CXL memory scales horizontally for giant datasets, adding capacity but at a latency disadvantage. HBM does not scale in capacity but provides minimal clock latency for accelerators, acting as a last-level cache rather than an address space extension.
CXL Memory (PCIe-attached memory expansion with coherency)

OS and driver support

OS interaction with HBM is implemented in two main ways: via the UEFI Special Purpose Memory (SPM) mechanism, where firmware marks regions with the EFI_MEMORY_SP attribute, and the Linux kernel reserves them as E820_TYPE_SOFT_RESERVED for later capture by drivers such as DAX/HMEM; or via direct device driver control (e.g., NVIDIA CUDA), which hot-plugs HBM into the system RAM pool with the IORESOURCE_SYSRAM_DRIVER_MANAGED flag, creating a separate NUMA domain and making the memory available to standard allocators without BIOS modification. Drivers also implement ACPI interfaces (_ON/_OFF methods) for power management on server platforms.

CUDA (Parallel computing on the graphics processing unit)

Security

HBM hardware security is built around a comprehensive RAS (Reliability, Availability, Serviceability) system: ECC (SEC-DED) error correction mechanisms and parity protection (Data/CA Parity) are integrated on the die, while HBM4 introduces Directed Refresh Management (DRFM) to specifically eliminate the Row Hammer vulnerability that could cause data corruption in adjacent cells. At the controller level, CRC data integrity checks and scrubbing (ECS) are implemented, cyclically scanning memory and correcting accumulated errors before they become uncorrectable.

Logging

Telemetry and logging functions in HBM4 transition from passive monitoring to predictive analytics: specialized error counters at channel and bank levels log both correctable and uncorrectable failures, while distributed thermal sensors (up to 8 per stack) monitor overheating with accuracy around ±1.5°C. This data is passed via enhanced sideband interfaces and controller APB registers to lifecycle management software to identify failure patterns and correlate error rates with temperature profiles.

Limitations

The key limitations of HBM remain high cost (HBM4 is roughly 30% more expensive than HBM3E) and the technological difficulty of scaling capacity, caused by DRAM density limitations and maximum stack heights (12–16 layers), creating a so-called memory wall that cannot keep up with the growth of AI models. Thermal management in dense 3D stacks is critically important, requiring significant energy for cooling; additionally, there are security risks: telemetry data on access patterns can create side channels for information leakage.

History and development

Standardized by JEDEC in 2013 as the first generation HBM1, the memory evolved to HBM3E by doubling the number of channels (from 8 to 16) and introducing process technologies that increased bandwidth to 1.2 TB/s per stack. HBM4 marks a paradigm shift: transitioning to a 2048-bit bus architecture, using an N12FFC+ process for the base logic die, introducing FinFET and hybrid bonding, as well as the emergence of the optional SPHBM4 standard on an organic substrate without a silicon interposer, expanding memory applications beyond AI accelerators to CPUs and networking chips.