What is x86 (Execution of instructions based on CISC architecture)

x86 (Extended 8086 Architecture) is a fundamental processor instruction set where complex operations are performed by a single long instruction. Imagine it as a Swiss army knife: instead of a set of simple blades for each action, you have ready-made built-in mechanisms capable of performing a multi-step task, such as reading from memory and an arithmetic operation, in one cycle.

The x86 architecture dominates the segment of personal computers, workstations, and entry-level and mid-range servers. It forms the basis of the vast majority of Intel and AMD processors running under Windows and Linux operating systems. Thanks to backward compatibility, modern systems can still execute program code written decades ago, which is critically important for corporate infrastructure, industrial controllers, and the banking sector.

Typical problems of x86

The main engineering challenge is backward compatibility inherited from the 16-bit mode, which creates an overloaded system for decoding variable-length instructions. The processor spends significant resources translating complex CISC instructions into internal RISC-like micro-operations. This results in limited energy efficiency in mobile devices compared to ARM. The problem of shadow registers and speculative execution vulnerabilities, such as Meltdown and Spectre, is also prominent.

CISC (Executing complex operations with a single instruction)ARM (Energy efficient execution of processor instructions)

Operating principle of x86

Unlike RISC architectures that operate with simple fixed-length instructions, x86 (Extended 8086 Architecture) implements the concept of complex instruction set computing, where instructions can range from 1 to 15 bytes in length. When the processor receives machine code, its decoder analyzes prefixes, the operation code, and ModR/M fields for memory addressing. The key feature is that at the physical level, modern x86 cores have long ceased to be classical CISC machines. The processor frontend converts complex x86 instructions into sequences of micro-operations (RISC-like uOps), which are then sent to the out-of-order execution scheduler. This allows for pipelining and parallelism at a level comparable to ARM, despite the archaic nature of the input code. A decoded micro-operation cache is used to accelerate computations, allowing the complex hardware recoding stage to be skipped during loop repetitions, significantly increasing performance in iterative tasks.

RISC (Accelerated execution of simple commands by the processor)

x86 functionality

Addressing modes. The x86 architecture provides several memory addressing modes, allowing flexible calculation of an operand’s effective address. The key mode is base-index addressing with displacement, where the address is formed by summing the contents of a base register, an index register multiplied by a scale factor (1, 2, 4, 8), and a constant displacement.
Segmented memory model. A logical address in real mode consists of a 16-bit segment selector and an offset. The physical address is calculated by the formula selector multiplied by 16 plus offset. This model provides addressing up to 1 MB, dividing the space into overlapping 64 KB blocks.
General-purpose registers. The processor contains eight 32-bit registers: EAX, EBX, ECX, EDX, ESI, EDI, EBP, and ESP. For backward compatibility, their lower 16 bits are accessible as AX, BX, CX, and DX, and each of these registers is further divided into high and low bytes, for example, AH and AL for AX.
Instruction pointer. The EIP register stores the offset of the next instruction to be executed in the current code segment. Its direct modification by software is prohibited; however, conditional and unconditional jumps, procedure calls, and returns implicitly modify EIP, ensuring execution flow control.
EFLAGS register. This 32-bit register contains a group of status, control, and system flags. Arithmetic instructions modify the Carry Flag (CF), Parity Flag (PF), Zero Flag (ZF), Sign Flag (SF), and Overflow Flag (OF). The Direction Flag (DF) controls auto-increment or auto-decrement in string operations.
PF (Hardware virtualization of Input-Output devices)
Interrupt mechanism. The architecture supports up to 256 interrupt vectors. In real mode, the vector table is located at address zero. Each vector contains a 4-byte far pointer to a handler. Hardware maskable interrupts are handled by a controller; the non-maskable interrupt uses vector 2.
Instruction set. The instruction set includes data transfer operations (MOV, XCHG), binary and decimal arithmetic (ADD, SUB, MUL, DIV), logical and shift operations (AND, OR, XOR, SHL, SHR), bit and byte operations, as well as program control instructions (JMP, CALL, RET).
String operations. The MOVS, CMPS, SCAS, LODS, and STOS instructions perform operations on chains of bytes, words, or double words. They implicitly use the register pair ESI as the source and EDI as the destination. The REP prefix automatically repeats the instruction, decrementing ECX until it becomes zero.
Stack organization. The stack grows toward lower addresses. The ESP register points to the top of the stack, and EBP is traditionally used as the stack frame base pointer. The PUSH and POP instructions place and retrieve operands, automatically adjusting ESP by the operand size, which can be 2 or 4 bytes.
Data alignment. Access to unaligned data in memory is permitted by hardware, but this leads to a significant performance penalty due to multiple bus cycles. To achieve maximum throughput, words should be aligned on even addresses, double words on addresses divisible by four, and quad words on addresses divisible by eight.
Page address translation. In protected mode, a two-level page hierarchy is enabled. The linear address is divided into a page directory index (10 bits), a page table index (10 bits), and an offset within the page (12 bits). The CR3 register stores the physical address of the current task’s page directory start.
Segment-level protection. Descriptors in the Global and Local Descriptor Tables define the access rights to a segment, its size, and its base address. The hardware checks the segment limit and access rights each time a segment selector is loaded into a register, generating a general protection exception upon violation.
Protection rings. The architecture defines four hierarchical privilege levels, from zero (most privileged, OS kernel) to three (user applications). Transitions between rings are strictly controlled through call gates, interrupt gates, and task gates to prevent unauthorized access to resources.
Task management. Hardware support for multitasking is implemented through the Task State Segment (TSS). The TSS stores the image of all registers and pointers to stacks for the protection rings. A task switch via a JMP or CALL instruction to a TSS selector loads the processor state from the new segment while saving the current one.
Virtual 8086 mode. A special mode allowing the execution of real-mode applications within a multitasking protected environment. When the VM flag is set in EFLAGS, the processor emulates a 16-bit environment. Hardware interrupts and I/O operations can be intercepted by the virtual machine monitor for simulation.
Cache memory and prefetching. The internal instruction and data caches minimize accesses to RAM. The prefetch unit decodes a sequential instruction stream into a queue, relieving the pipeline of fetch delays. An incoherent cache requires an explicit flush via WBINVD or INVD instructions after data modification in memory.
SMM mechanism. System Management Mode is a privileged state isolated from the OS, entered upon an SMI# signal. The processor saves the context in a special SMRAM memory area and executes the firmware handler. Upon exit via the RSM instruction, the state prior to the interrupt is fully restored.
I/O port operations. The I/O address space is independent of memory and consists of 64K 8-bit ports. The IN and OUT instructions transfer data between the EAX register and a port with an immediate address or an address in DX. Port access rights are governed by the I/O Permission Bitmap in the TSS.
MMX SIMD extension. Multimedia Extensions technology reuses the 80-bit floating-point registers, mapping their lower 64 bits as MM0-MM7 registers. It implements the SIMD paradigm, processing packed integer vectors of eight bytes, four words, or two double words with a single instruction.
MMX (Single-cycle processing of packed integer data)

Comparisons

x86 vs ARM. The x86 architecture is based on a CISC computation model with complex, multi-cycle instructions capable of performing operations directly with memory, whereas ARM uses a RISC philosophy emphasizing simple fixed-length instructions and a strict load-store model. This difference results in high code density for x86 and potential energy efficiency for ARM due to a simplified decoder and predictable pipeline.
x86 vs RISC-V. Unlike the deeply entrenched and backward-compatible x86 architecture, defined by proprietary extensions of specific vendors, RISC-V represents an open standard with a modular instruction set structure. The fundamental advantage of RISC-V lies in its free licensing model, allowing developers to create specialized processors without royalties, while x86 is limited by the licensing barriers of Intel and AMD.
RISC-V (Open modular instruction set architecture)
x86 vs MIPS. The x86 architecture was originally designed with microcode in mind, allowing complex instructions to be emulated without complicating the hardware logic, which ensured a smooth transition from 16-bit to 32-bit and 64-bit computing. MIPS, a classical RISC architecture, relied on hardware-implemented instructions and the concept of a single-cycle-per-instruction pipeline, which gave an advantage in clock frequency in the early stages but lost out in scalability flexibility.
MIPS (Simplified pipelined RISC architecture without interlocks)
x86 vs Itanium (IA-64). The x86 approach to parallelism relies on hardware dynamic instruction reordering (out-of-order execution) inside the processor, hiding the complexity from the compiler. Itanium implemented the VLIW concept, shifting all responsibility for instruction scheduling and branch prediction onto the static compiler, which led to extreme software development complexity and an inability to efficiently execute unpredictable x86 code, predetermining the platform’s commercial failure.
Itanium (Explicit static scheduling of parallel instructions)IA-64 (Architecture of explicitly parallel instruction computing EPIC)VLIW (Parallel execution of commands without a hardware scheduler)
x86 vs 68k (Motorola 68000). The x86 register model suffered for a long time from an acute shortage of orthogonal general-purpose registers, causing frequent memory and stack accesses, unlike the 68k, where an abundant register file with a flat data addressing model allowed compilers and assembly programmers to store local variables more efficiently. However, the x86 segmented memory model proved more flexible for emulating virtual machines than the flat address space of the m68k.

OS and driver support

The x86 architecture implements operating system support through a hardware multi-level protection ring model (Ring 0–3), where the OS kernel operates at Ring 0 with full access to instructions and memory, and user applications are isolated in Ring 3. Drivers interact with the hardware by mapping physical device addresses into the kernel’s virtual address space using page tables and the Memory-Mapped I/O mechanism, while special IN/OUT instructions to I/O ports, available only in privileged mode, are used for command transmission.

Security

Security in x86 is provided by hardware access delineation via segment descriptors in the GDT/LDT containing DPL privilege levels, and page protection with Present, Read/Write, and User/Supervisor bits in page table entries. Modern extensions include SMEP (Supervisor Mode Execution Prevention) and SMAP (Supervisor Mode Access Prevention), which block code execution and data access to user space from kernel mode in hardware, preventing privilege escalation class attacks. Intel SGX technology creates hardware-isolated enclaves with memory encryption at the DRAM controller level to protect data even from a compromised OS.

DRAM (Storage and Byte-addressing of Data)

Logging

Hardware logging in x86 is based on the interrupt system and the debugging mechanism via Debug Registers (DR0–DR7), allowing breakpoints to be set on instruction and data addresses with automatic generation of an INT 1 exception upon trigger. The Intel PT (Processor Trace) extension performs control flow logging with timestamps, writing TNT, TIP, and FUP packets to a dedicated physical memory buffer via the ToPA (Table of Physical Addresses) mechanism with minimal performance impact. Meanwhile, the Last Branch Record stores branch history in MSR (Model-Specific Registers), accessible only from Ring 0.

Limitations

The fundamental limitations of x86 stem from the legacy of 16-bit real mode, where the boot process starts at address 0xFFFF0 and is limited to 1 MB of address space until switching to protected mode. Backward compatibility with outdated devices is maintained through the emulation of the 8259 PIC interrupt controller and the 8254 PIT timer. The architectural physical addressing limit, even in 64-bit mode, is capped by current implementations at 57 bits (5-level paging). Meanwhile, the problem of cache fragmentation persists due to the fixed 64-byte cache line size and competitive access to the APIC during inter-processor interrupt handling.

APIC (Interrupt Routing and Prioritization in multiprocessor systems)

History and development

The evolution of x86 began with the 16-bit Intel 8086 in 1978, featuring segment:offset addressing and a set of 16 registers. It transitioned to 32-bit protected mode in the 80386 with the introduction of page address translation via two-level tables and the Virtual 8086 mechanism for compatibility. It then expanded to the 64-bit Long Mode through the AMD64 architecture with a flat memory model, a doubling of general-purpose registers to 16, and the introduction of RIP-relative addressing. Modern iterations add AVX-512 vector extensions with 512-bit ZMM registers and AMX matrix accelerators for tensor algebra operations using eight TILE registers and the TMUL instruction set.

AVX-512 (Processing 16 numbers per instruction)