x64 is an evolution of the classic processor instruction set that allows a computer to address huge amounts of RAM (theoretically up to 16 exabytes) and process 64-bit numbers in a single operation, while maintaining full compatibility with 32-bit software.
The x64 (64-bit Extended 8086 Architecture) architecture is the dominant standard for desktop computers, workstations, laptops, and servers based on Intel and AMD processors. It is used in all modern operating systems, such as Windows 11, macOS, Linux, and FreeBSD. It runs high-load databases, video editing and scientific calculation tools, as well as game engines where floating-point computation speed and memory access beyond the 4 GB limit are critically important.
Typical x64 problems
The main problem during the transition to x64 was incompatibility with old 16-bit applications and drivers due to the removal of virtual 8086 mode support in Long Mode. Binary files compiled for 64 bits often take up more space on disk and in RAM due to the increased size of pointers, which can cause a slight decrease in cache performance. Porting legacy code requires caution when changing variable bit widths to avoid hidden overflows.
How x64 works
x64 is based on Long Mode, activated by setting a flag in a control register. Unlike pure 32-bit Protected Mode, where general-purpose registers (EAX, EBX, and others) are limited to 32 bits, x64 expands them to 64-bit counterparts (RAX, RBX), doubling the available set to 16 registers (R8–R15 have been added). The key difference from its predecessors is the addressing mechanism. While the x86 architecture used a two-level page table with 4-kilobyte entries, x64 implements a four-level traversal, mapping 48-bit virtual addresses to physical ones. This allows direct access to 256 TB of memory instead of the 4 GB limit. The instruction set has been reworked so that RIP-relative addressing mode makes position-independent code more efficient, reducing the load on the global offset table compared to outdated x86 methods. Unlike the competing IA-64 (Itanium) architecture, which relied on explicit instruction parallelism and lost compatibility with x86, x64 chose the path of extending the existing CISC architecture with microcode translation of complex instructions into RISC-like micro-operations, preserving backward compatibility and ensuring a smooth industry transition.
x64 functionality
- Long Mode and sub-modes. The processor switches to 64-bit mode by activating the
LMEbit in theEFERregister, followed by loading a page directory where theLMAbit is set by hardware. This mode radically changes the interpretation of the virtual address space, canceling segment limits for code and data. - Canonical addresses and unused bits. Virtual memory addresses are strictly limited to 48 bits with sign extension to 64 bits. Any attempt to reference a non-canonical address, where the upper bits do not match bit 47, triggers a General Protection Fault exception (
#GP), protecting the integrity of the address space. - Flat memory model and segmentation. In 64-bit mode, the segment registers
CS,DS,ES,SSignore the base and limit fields, creating a flat address space. Hardware limit checking is not performed, except for theFSandGSregisters, whose bases are programmed through specializedMSRs to address thread-local data. - Register file and zero extension. The architecture adds eight integer registers (
R8–R15) and eight 128-bitXMMregisters (XMM8–XMM15). Any 32-bit operation on general-purpose registers in this mode implicitly zeros the upper 32 bits of the corresponding 64-bit register, destroying potential dependency on previous values. - Unmodified default operand size. The default operand size for most instructions in 64-bit mode is 32 bits, not 64. To explicitly specify 64-bit operations, the REX (register extension) prefix is used, which also encodes access to the additional
R8–R15registers and the upper bytes ofXMM. - Instruction encoding and the REX prefix. The REX prefix (range
40h–4Fh) is placed immediately before the opcode. Its four flags extend the ModR/M and SIB fields: theWbit sets the 64-bit operand size; theRbit extends theregfield; theXbit extends theindex; theBbit extends thebaseorr/m, allowing the entire extended register file to be uniquely addressed. - RIP-relative addressing for data. Instructions have gained a new addressing mode relative to the instruction pointer (
RIP). A 32-bit signed offset is added to the address of the next instruction to calculate the effective address. This fundamentally simplifies the generation of position-independent code and access to global data tables in shared libraries. - Elimination of single-byte INC/DEC encoding. The single-byte opcodes
40h–4Fh, previously assigned to theINCandDECinstructions in 32-bit mode, were repurposed as REX prefixes. As a result, in 64-bit mode, these instructions are available only in a two-byte encoding form, eliminating prefix collision. - System calls via the SYSCALL instruction. A fast
SYSCALLinstruction has been introduced to transition to ring 0. It savesRIPinRCXandRFLAGSinR11, then loadsRIPfrom theLSTAR MSRandCSfromSTAR. Return viaSYSRETrestores control, setting the segment selector depending on the bit specified inSTAR. - Hardware stack and its alignment. The stack pointer
RSPmust maintain 16-byte alignment before executing aCALLinstruction. Violating this rule at the moment of entering an interrupt handler or when executing aligned SIMD instructions results in a General Protection Fault (#GP) due to an alignment failure. - Parameter passing in the x64 convention. The standard Microsoft x64 calling convention passes the first four integer arguments through the
RCX,RDX,R8, andR9registers, and the first four floating-point arguments throughXMM0–XMM3. Parameters that do not fit in registers are placed in shadow space and further onto the stack from right to left. - Exceptions and interrupt masking. The interrupt mechanism uses a 64-bit Interrupt Descriptor Table (
IDT) with extended gates. The hardware exception stack uses theISTkernel register in theTSSto guarantee a switch to a known good stack, preventing double faults upon userRSPcorruption. - Hardware virtualization and guest mode. The
VMXextension uses 64-bit structures for VM-entry and VM-exit control (VMCS). The host-state fields load exclusively canonical addresses into pointer registers. Failure to load a non-canonical host-segment address causes an immediate VM-entry failure with a fatal machine signal. - VMCS (Virtual Machine control structure)
- Mandatory NX bit and page protection. In the PAE page tables used in Long Mode, bit 63 of the entry (Execute Disable) hardware-controls the ability to fetch instructions. An attempt to execute code from a page with the
NXbit set triggers a Page Fault exception, overriding code-data coherence. - Microarchitectural descriptor cache verification. The processor caches the hidden parts of segment registers. In Long Mode, automatic cache invalidation upon writing to a segment register guarantees the absence of hidden base offsets, except from explicitly loaded
FSandGSregisters via theWRMSRinstruction. - Address translation and five-level paging. The
LA57extension supplements the 4-level paging scheme with a fifth level (PML5). Activation allows the processor to address a 57-bit linear space. TheLA57bit inCR4controls the format of the page directory base pointer loaded intoCR3, expanding the range of physical frame numbers. - Atomic 128-bit operations. The
CMPXCHG16Binstruction supports atomic comparison and exchange of 16-byte data in memory. TheRDX:RAXandRCX:RBXregisters are used as operands, guaranteeing transaction integrity over large structures without using mutexes, critical for lock-free algorithms. - Context saving and restoring. The
FXSAVE64instruction saves the full state of FPU, MMX, andXMMregisters to a 512-byte memory area, andFXRSTOR64restores it. The state map format depends on the mode bit, and the 64-bit version requires the use of 64-bit pointers in the data structure. - Early detection of integrity errors. The
PCID(Process Context Identifiers) mechanism in the TLB allows the kernel to switch the virtual address space without completely flushing the translation lookaside buffer. The combination ofPCIDwith the mode tag in the cache entry eliminates translation leaks between IA-32e and Legacy modes. - Control-flow Enforcement Technology (CET). Shadow Indirect Branch Tracking uses the
ENDBR64instruction as a valid target marker. Any indirect branch instruction that does not land onENDBR64triggers a#CPexception, blocking ROP and JOP attacks at the hardware level.
Comparisons
- x64 vs IA-64 (Intel Itanium Architecture). x64 represents an evolutionary extension of the x86 instruction set while maintaining backward compatibility, implementing the AMD64 model. IA-64, in contrast, is a radically new Explicitly Parallel Instruction Computing (EPIC) architecture that completely breaks ties with the x86 legacy. The fundamental difference lies in the strategy: x64 bet on a smooth transition and support for existing software, whereas IA-64 required complete recompilation and specific code optimization, which ultimately led to its commercial failure in the mass market.
- x64 vs ARM64 (AArch64). The x64 architecture traditionally follows the CISC philosophy with complex instructions decoded into micro-operations, optimizing performance per clock for high-load computing. ARM64 uses a RISC approach with fixed-length instructions, originally designed for maximum energy efficiency. This difference defines the areas of dominance: x64 historically leads in servers and desktop systems with a high thermal envelope, while ARM64 dominates in mobile and embedded solutions, actively invading the server segment thanks to superior performance per watt.
- x64 Long Mode vs x86 Protected Mode. Long Mode of the x64 architecture dramatically expands the capabilities of Legacy Mode by introducing 64-bit integer registers, doubling the number of general-purpose and SSE registers, and providing a flat addressing model with a theoretical limit of 256 TB of virtual memory. x86 Protected Mode is limited to 32-bit registers and 4 GB of physical address space without using segment-based addressing. Switching to Long Mode allows the use of monolithic amounts of RAM without the overhead of physical address extension, which is critically important for modern databases and virtualization systems.
- x64 SYSCALL vs x86 SYSENTER/SYSEXIT. To execute fast system calls in x64, the
SYSCALLinstruction is used, which differs from theSYSENTERandSYSEXITpair used in 32-bit mode. TheSYSCALLmechanism atomically switches the privilege level by loading the kernel stack pointer and target address from specialized model-specific registers (MSRs) without memory access. This provides lower context-switching latency compared to the outdated method of software interrupts and optimizes the microprocessor pipeline by minimizing speculative flushes upon entering the kernel. - x64 AVX-512 vs x86 SSE2. The extension of the x64 instruction set to AVX-512 represents an evolution of vector processing compared to the basic SSE2 extension. AVX-512 operates on 512-bit
ZMMregisters, allowing sixteen single-precision floating-point operations to be processed in a single instruction versus four using 128-bitXMMregisters in SSE2. Besides the increased bit width, AVX-512 introduces a flexible predicate mask syntax, which allows complex conditional branches to be vectorized without expensive branching operations, significantly boosting performance in machine learning and cryptography tasks.
OS and driver support
Support implementation in x64 is carried out through mandatory digital signing of kernel-mode drivers, verified by the operating system loader using cryptographic hashes and a public key infrastructure prior to loading the module into memory, as well as through the WOW64 compatibility layer, which emulates a 32-bit environment by translating system calls and switching code segments between long and compatibility processor operating modes.
Security
Hardware security is based on the enforced prohibition of code execution on memory pages marked with the NX bit in address translation tables, which prevents classic buffer overflow attacks, as well as on the SMEP mode, where the processor generates an exception when attempting to execute instructions from user space within the context of ring 0, and on shadow stacks that protect return addresses from modification.
Logging
Internal logging of architectural events is implemented through the Intel PT extension, which captures the full instruction stream in hardware, forming packets with timestamps and branch addresses into a dedicated physical memory buffer, while the operating system uses Windows Event Tracing, which activates built-in probes in kernel and driver code to record structured events in circular buffers without stopping execution.
Limitations
A fundamental limitation of current x64 implementations remains the use of only the 48-bit virtual address space out of the theoretically available 64 bits, which leads to a canonical division of addresses into upper and lower halves with the mandatory sign extension of bit 47 into the upper bits, along with physical memory fragmentation and the overhead of a five-level page table hierarchy during address translation.
History and development
The architecture was created by AMD in 1999 as the x86-64 extension, introducing a flat memory model with the elimination of segmentation in Long Mode and an increase in the number of general-purpose registers from eight to sixteen. Further development included the implementation of AVX-512 and Intel AMX extensions, adding new register files and matrix accelerators for vector and tensor computations while preserving backward compatibility with the original 8086.