GDDR6X is a type of ultra-fast graphics memory developed by Micron and NVIDIA. Simply put, it is an evolution of video memory that uses four voltage levels for data transmission instead of the usual two. This approach allows twice as much information to be transferred per clock cycle without a significant increase in frequency, providing bandwidth close to one terabyte per second for flagship graphics cards.
This memory is used exclusively in high-performance consumer NVIDIA GeForce RTX 30 and RTX 40 series graphics cards, as well as in professional RTX accelerators. Its main task is to handle resource-intensive computations such as real-time ray tracing, 8K texture processing, and neural network training. GDDR6X remains a unique technology not found in competitor products or gaming consoles, which rely on GDDR6.
The main technical challenges are high heat generation and significant power consumption. Due to the intensive operation of the chips, temperatures on the back side of the printed circuit board can reach critical levels, requiring massive cooling systems. Compared to regular GDDR6, energy efficiency decreases proportionally to the increase in signal processing complexity. The transition to PAM4 also makes the memory extremely sensitive to electromagnetic interference, contact bounce, and signal degradation over long traces, complicating board layout.
How GDDR6X works
The working principle is based on abandoning the traditional binary NRZ scheme, where one bit per clock cycle is transmitted using two signal states: zero and one. GDDR6X implements PAM4 technology, which uses four amplitude voltage levels to encode two bits of data per clock interval. The signal generator creates not just the presence or absence of current, but discrete states that can be represented as 00, 01, 10, and 11. Inside the chip, an analog-to-digital converter and a complex signal processor restore the original digital stream based on voltage level and phase. This approach doubles effective bandwidth at the cost of reduced signal-to-noise ratio, since the distance between adjacent voltage levels becomes critically small. To compensate, the interface employs adaptive error correction, transmit pre-emphasis techniques, and complex receiver calibration during operation, allowing the system to dynamically adjust to temperature and frequency changes to prevent data loss.
GDDR6X functionality
- PAM4 encoding: physical transmission principle. Unlike traditional NRZ with two signal levels, GDDR6X uses four-level pulse amplitude modulation (PAM4). Logic states are encoded by discrete voltage levels (e.g., 00, 01, 10, 11), transmitting two bits of data per clock cycle instead of one. This doubles bandwidth without proportionally increasing the interface clock frequency.
- Doubling bandwidth per pin. The transition to PAM4 theoretically doubles data density relative to NRZ at the same Nyquist frequency. In practice, this provides bandwidth of up to 24 Gbps per pin, significantly exceeding the 16 Gbps limit typical of standard GDDR6. The total bandwidth of a 32-bit channel reaches 96 GB/s.
- Reduced carrier frequency requirements. Since two bits are packed into one symbol (Unit Interval), the fundamental signal transmission frequency for a given data rate is halved compared to NRZ. This relaxes requirements for PCB bandwidth and package materials, reducing high-frequency losses introduced by the communication channel.
- Energy efficiency per bit of information. Transmitting data at a lower Nyquist frequency reduces the interface dynamic power consumption. According to developer data, when moving one bit of information, the GDDR6X interface in PAM4 mode demonstrates approximately 15% less energy consumption compared to equivalent NRZ transmission at high speeds.
- Signal-to-noise ratio challenge. The reduced distance between signal levels (eye height) in PAM4 makes the interface significantly more vulnerable to noise and inter-symbol interference. The signal-to-noise ratio drops by about 9.5 dB compared to NRZ. This requires implementing complex error correction and equalization mechanisms directly into the memory subsystem architecture.
- Transmitter with feed-forward equalization (FFE). To compensate for losses and reflections in the communication channel, the GDDR6X transmitter is equipped with a two-tap finite impulse response filter (2-tap FFE). This mechanism pre-distorts the signal before transmission, increasing the vertical and horizontal eye opening at the receiver side.
- Receiver with continuous time linear equalizer (CTLE). A single-stage CTLE is used in the receiver analog path. It performs frequency-dependent gain, selectively suppressing low-frequency signal components and compensating for high-frequency attenuation caused by skin effect and dielectric losses in the package and board traces.
- Decision feedback equalization (DFE). A one-tap decision feedback equalizer (1-tap DFE) is used to eliminate inter-symbol interference from previous symbols. It subtracts residual noise voltage from the current signal based on the detected level of the previous bit, without amplifying noise, which is critical for multi-level modulation.
- Quad data rate (QDR). The clocking subsystem supports not only DDR but also Quad Data Rate mode. In this mode, data capture occurs on both edges of the differential write clock (WCK), operating at double frequency, allowing 24 Gbps streams to be synchronized without extreme system clock frequencies.
- Interface training and strobe centering. For reliable operation at speeds above 20 Gbps, a comprehensive training procedure is implemented. The memory controller dynamically calibrates delays between command/address lines and the WCK2CK strobe, and adjusts phase relationships for each data bus bit in read and write modes.
- Receiver reference voltage (Vref Training). Unlike a fixed threshold, GDDR6X uses independent reference voltage adjustment for each data pin (RX Vref training). This compensates for DC offsets in signal levels on different lines caused by trace non-uniformity and load asymmetry.
- Bus inversion (DBI/CABI). To reduce simultaneous switching of many lines and power supply voltage drops, Data Bus Inversion and Command/Address Bus Inversion are used. The working principle is based on limiting the number of lines transitioning to the active state at any given moment.
- Cyclic redundancy check (CRC). Error detection is provided by a cyclic redundancy check mechanism for read and write operations. Checksums are written with data and verified by the receiver, allowing detection of failures caused by high-energy particle impact or signal degradation.
- Thermal management and peak junction temperature. High signal packing density leads to significant thermal power dissipation (2.5–3 W per chip) over a small die area. Built-in sensors monitor junction temperature, with a maximum specified value of 110°C. When approaching the threshold, throttling is activated to prevent degradation.
- Thermal density management. Low thermal resistance of the package and board requires intensive heat removal. The difference between die temperature and PCB surface temperature can reach 20 degrees, which must be considered when designing graphics card cooling systems and VRM module placement.
- Refresh modes. The memory controller supports standard SDRAM mechanisms: Auto Refresh and Self-Refresh low-power mode. These preserve data integrity in memory banks when the GPU enters power-saving idle states.
- SDRAM (Synchronous Data Storage and Retrieval)
- On-die termination (ODT). Built-in termination resistors for data and strobe lines are dynamically adjustable. Precise impedance matching of the receiver to the line characteristic impedance minimizes signal reflections and improves eye diagram integrity without external passive components.
- Pseudo open drain (POD-135). The I/O interface operates in pseudo open drain mode with a 1.35 V supply voltage. Termination is pulled up to VDDQ, allowing signals with smaller voltage swings and thus reducing dynamic power compared to traditional SSTL push-pull buffers.
- Driver current calibration. Automatic output driver impedance calibration compensates for temperature and supply voltage variations. The process adjusts output transistor current so that PAM4 voltage levels precisely match specified amplitude values, preventing eye diagram closure.
- Low power modes. Specialized deep sleep states are provided, where internal clock trees and strobe receivers are disabled, reducing quiescent current to a minimum. Exiting these modes requires re-initialization and calibration of training circuits for synchronization with the controller.
- Vendor identification and configuration. GDDR6X chips provide access to Vendor ID registers and configuration fuses via a standardized interface. This allows the GPU to automatically determine the type, density (up to 8 GB), width (x8/x16), and timing parameters of the installed memory to configure optimal operating mode.
- Flexible bank architecture. The memory organization includes 16 internal banks per independent channel, grouped into bank groups. This hierarchy allows efficient interleaving of access operations, hiding row restore and activation latencies, which is critically important for processing graphics texture streams.
Comparisons
- GDDR6X vs GDDR6. The key difference lies in the signal transmission method. GDDR6 uses traditional binary PAM4 modulation (two bits per clock), while GDDR6X transitions to multi-level PAM4 modulation (four signal levels). This engineering solution allows GDDR6X to transmit twice as much data without proportionally increasing clock frequency, but the price for density is increased noise sensitivity and the need for complex error correction mechanisms.
- GDDR6X vs HBM2E. This contrasts bandwidth and form factor. GDDR6X achieves outstanding per-chip speed, placed discretely around the GPU, which is cheaper and easier to manufacture. HBM2E offers an extremely wide bus (up to 1024 bits) and low power consumption through 3D stacking near the die. HBM wins in efficiency for AI accelerators, while GDDR6X dominates in gaming graphics cards with limited substrate area budgets.
- HBM (3D stacked memory with silicon vias)
- GDDR6X vs GDDR5X. GDDR6X is the evolutionary successor to GDDR5X, which first tested PAM4 modulation in mass-market graphics memory. The difference lies in technology maturity: GDDR6X radically raises the frequency ceiling beyond 19-21 Gbps, unattainable for its predecessor. Additionally, the move from an 8n prefetch to deeper pipelining in GDDR6X reduced the internal memory core frequency relative to the interface, improving signal stability at extreme speeds.
- GDDR6X vs ECC memory (server class). This comparison lies in data integrity. Server ECC memory (DDR5 ECC) hardware-corrects single-bit errors, critical for fault-intolerant computing. Although GDDR6X is not full-fledged ECC memory in the classical sense, it is forced to implement detection and data retransmission mechanisms due to the difficulty of PAM4 decoding. This built-in protection level makes GDDR6X a compromise between pure rendering speed and engineering reliability of high-speed transmission.
- DDR5 (High-speed energy-efficient computer RAM)ECC (Memory Error Detection and Correction)
- GDDR6X vs Infinity Cache (AMD). This is a comparison of architectural philosophies: raw speed versus efficiency. GDDR6X relies on enormous peak bandwidth of the external bus. Infinity Cache, conversely, uses a large on-die L3 cache to reduce accesses to slower GDDR6, cutting power consumption for long-distance data transfer. GDDR6X wins at high resolutions that load the bus, while large cache is more effective at reducing latency with a narrow memory bus.
Security
Hardware security is implemented through critical temperature monitoring. Chips have built-in Tjunction sensors, which the graphics card firmware polls to dynamically reduce frequencies to avoid silicon degradation. According to research, peak values can reach 104–110°C, after which throttling activates until returning to a safe thermal range.
Logging
Monitoring GDDR6X status is done through the GPU internal sensor loop, which directly reads the Tjunction value of the hottest memory chip. However, this information is hidden in standard user tools like NVML, and specialized engineering software is required to extract it, limiting detailed logging at the user level.
Limitations
Key limitations include high heat dissipation (about 2.5–3 W per chip), forcing factory overclocks to only 19 Gbps instead of the theoretical 21 Gbps, the lack of officially published specifications for maximum temperatures, and the impact of PCB trace complexity with PAM4 signals on data transmission stability.
History and development
GDDR6X became the industry first production DRAM with four-level pulse amplitude modulation (PAM4), replacing the traditional binary NRZ signal. This allowed Micron, after more than a decade of research and 45 patents, to double bandwidth to 84 GB/s per chip and achieve a system bandwidth of 1 TB/s in partnership with NVIDIA GeForce RTX 3080/3090.