RISC (Accelerated execution of simple commands by the processor)

RISC (Reduced Instruction Set Computer) is a processor architecture where the instruction set is intentionally made minimal and simple. Each command is executed in one clock cycle, which greatly speeds up operation and simplifies the internal design of the chip. Complex tasks are solved by combining these primitive operations rather than using a single cumbersome instruction.

Today RISC architectures dominate mobile and embedded systems. The vast majority of smartphones and tablets run on ARM processors, which are heirs to the RISC philosophy. Additionally, the principles of a reduced instruction set form the basis of the RISC-V architecture, which is gaining popularity in data centers, the Internet of Things, and microcontrollers where energy efficiency is critical.

Typical problems

The main engineering difficulty is dependence on compiler performance and the memory subsystem. Since a program is assembled from many atomic operations, inefficient machine code generation directly reduces speed. RISC processors also generate more voluminous binary code than CISC counterparts, which creates increased load on cache memory and requires faster instruction fetch interfaces.

How RISC works

The RISC architecture operates on the basis of pipelined processing and a load-store architecture. The central idea is that all data operations are performed exclusively inside the processor’s fast registers, and access to relatively slow main memory is implemented only by two types of instructions: Load (loading from memory into a register) and Store (saving from a register into memory). This approach contrasts sharply with the CISC (Complex Instruction Set Computer) philosophy, where a single instruction can simultaneously read data from memory, perform an arithmetic operation, and write the result back to memory. Thanks to the orthogonality of the instruction set and their fixed length (for example, 32 bits in classic RISC), the processor’s decoding logic becomes extremely simple and occupies few transistors. The freed-up space on the chip is given to an enlarged register file (often 32 or more general-purpose registers instead of 8–16 in x86), which radically reduces the frequency of memory accesses. Pipelined processing in RISC is close to ideal: since all instructions have the same format and execution time, the stages of fetch, decode, execute, and write-back can be parallelized so that the processor finishes executing the next command every clock cycle. Unlike CISC microcode, which expands a complex instruction into a sequence of micro-operations with unpredictable delays, a RISC processor executes them directly in hardware without intermediaries, ensuring deterministic behavior and facilitating the implementation of superscalar designs with out-of-order execution.

RISC functionality

  • Minimization of command formats. All RISC instructions have a fixed length, usually 32 bits, which eliminates decoding ambiguity. The opcode field is located strictly in a fixed position, allowing the command to be fetched in one clock cycle without analyzing variable opcode length.
  • Orthogonality of the register file. General-purpose registers in RISC are completely interchangeable for any operations. There is no rigid binding of specific registers to specific instructions, which removes restrictions for the compiler when allocating variables and minimizes data transfer traffic.
  • Implementation of load-store architecture. Memory access is strictly separated: only LOAD and STORE instructions can access main memory. All arithmetic and logical transformations are performed exclusively on register contents, which eliminates side effects of memory modification within computational operations.
  • Pipelining based on execution homogeneity. Thanks to the simplicity and regularity of the instruction set, most instructions execute in the same number of stages. This allows implementing a deep pipeline without the complex hazard detection logic inherent in CISC processor microcode with heterogeneous internal micro-operations.
  • Hardware three-address code. Most RISC operations explicitly specify two source registers and one destination register for the result. This model eliminates the implicit use of accumulators and allows preserving source operands without additional copy operations, simplifying scheduling at the compilation stage.
  • Rigid compiler binding to scheduling. RISC lacks a complex hardware scheduler for reordering micro-operations. Responsibility for eliminating pipeline interlocking is shifted to the optimizing compiler, which statically reorders instructions during the code generation stage.
  • Predicated execution of instructions. To minimize branch penalties, all commands can be supplied with a condition field. The instruction enters the execution stage, but its result is committed to the architectural state only if the predicate is true, which allows eliminating short conditional branches.
  • Overlapping register windows. The large register file mechanism implements a sliding window for passing parameters between procedures. When a subroutine is called, a new set of registers is allocated, overlapping with the old one, which eliminates context saving through stack memory when passing arguments.
  • Delayed branching. The instruction immediately following a conditional or unconditional branch command is executed in any case, while the branch takes effect with a one-clock delay. The compiler fills this slot with a useful operation independent of the branch result.
  • Support for superscalar fetch. The fixed instruction length allows the hardware to read a wide memory word containing several commands simultaneously. The grouping of independent operations into packets is performed statically for parallel issue to multiple execution units without decoding command boundaries.
  • Atomicity of a register pair. In some implementations, double-word load instructions guarantee atomic copying of two adjacent registers from memory. This is used to handle 64-bit data in a 32-bit microarchitecture, preventing data integrity breakdown during interrupts and multithreaded processing.
  • Non-interference in microarchitecture. The RISC instruction set deliberately hides chip implementation details. There are no micro-instructions or cache control registers visible to the programmer, which ensures binary compatibility of programs when migrating between processor generations with different internal pipeline widths.
  • Fast interrupt context. Instead of automatically saving the entire register file to the stack, the processor switches a bank of shadow registers. The hardware interrupt handler instantly receives a clean context without memory operations, reducing the system’s response time to real-time events.
  • Linked multiply-accumulate operations. Multiplication is performed as an atomic operation returning a double-word result into a coupled pair of registers. Subsequent accumulation through an accumulator is implemented without cyclic overflow checks, which is critical for digital signal processing cores in RISC architecture.
  • Exclusive memory access. For thread synchronization, pairs of load-linked and store-conditional instructions are used. The processor monitors the address, and the write fails without bus locking if another core intervened in the cache line between the moment of reading and the attempt to update.
  • Unification of integer and address operations. Address arithmetic is performed by the same commands as integer arithmetic, without a separate address adder. This radically simplifies the pipeline, as the effective address calculation phase is combined with the execution stage of the arithmetic logic unit.
  • Bit manipulation with field extraction. Specialized instructions allow extracting, inserting, or clearing a continuous range of bits within a register in one clock cycle. Hardware support without cyclic shifts and masking accelerates network packet processing and data deserialization.
  • Register zeroing without arithmetic. Instead of using a separate load zero instruction, a hardware-level zero register is used. This register always returns zero when read and ignores writes, which allows synthesizing various pseudo-instructions such as move or compare with zero.
  • Speculative data loading. A prefetch command initiates a fetch from slow memory long before the actual use of the result. In case of a cache miss, the data is requested in parallel with the main computation flow, and the exception check is postponed until the moment of accessing the target register.
  • Instruction set compression. A subset of frequently used 16-bit instructions is dynamically expanded into standard 32-bit equivalents at the fetch stage. This densifies the code without increasing decoder complexity, reducing instruction memory energy consumption while preserving the RISC orthogonality of the inner core.

Comparisons

  • RISC vs CISC. The RISC architecture uses fixed instruction length and single-cycle execution for most operations, whereas CISC employs variable-length instructions implemented by microprogrammed control. This makes pipelining in RISC more predictable and efficient, eliminating the overhead of decoding complex commands inherent to CISC and allowing higher clock frequencies to be achieved with lower core power consumption.
  • RISC vs VLIW. In RISC processors, a hardware scheduler dynamically determines the order of instruction execution and resolves data conflicts at runtime. VLIW shifts the task of identifying parallelism to the compiler, which statically packs several operations into a long instruction word. This simplifies the hardware control logic of VLIW but makes it critically dependent on the quality of static code analysis at the compilation stage.
  • VLIW (Parallel execution of commands without a hardware scheduler)
  • RISC vs EPIC. RISC relies on a superscalar architecture with speculative execution and branch prediction to extract instruction-level parallelism directly during runtime. EPIC (Explicitly Parallel Instruction Computing), implemented in Itanium, uses explicit compiler hints in the code about dependencies and the possibility of simultaneous operation execution. This reduces hardware complexity compared to RISC cores but requires fundamentally different approaches to compiler design.
  • Itanium (Explicit static scheduling of parallel instructions)EPIC (Division of responsibility for execution parallelism)
  • RISC vs MISC. The RISC architecture operates with a relatively simple yet extensive set of register instructions, focusing on high clock frequency. MISC (Minimal Instruction Set Computer) strives for extreme minimization of hardware resources, reducing the number of instructions to a few dozen and using a stack-based computation organization. This sacrifices RISC pipeline performance for maximum code compactness and extremely low implementation cost.
  • MISC (Executing commands through a single universal instruction code)
  • RISC vs ARM. The classic RISC concept prescribes a uniform instruction format, as in the original MIPS or SPARC architectures. ARM, being a commercial implementation of the RISC philosophy, added conditional execution for almost all instructions and extended addressing modes with a shift in the data path. This allowed ARM to surpass canonical RISC solutions in code density and energy efficiency in embedded systems while retaining the basic simplicity of decoding.
  • SPARC (Open standard RISC architecture)MIPS (Simplified pipelined RISC architecture without interlocks)

OS and driver support

Support is implemented through strict orthogonality of the instruction set, which allows the compiler to generate predictable code for the kernel scheduler, while drivers use atomic Load-Linked/Store-Conditional instructions for synchronization without complex locks, and the uniformity of the register file eliminates the need to save specialized contexts during interrupt handling.

Security

Security is ensured by reducing microarchitectural side channels, as the fixed instruction length and absence of microcode exclude undocumented states, and memory protection mechanisms (MPU/PMP) physically isolate regions using address range control, prohibiting code execution on the stack without additional virtualization costs.

Logging

Logging is implemented via built-in trace blocks (Embedded Trace Macrocell), which form a stream of packets with information about branches and core state without stopping the pipeline, sending data through dedicated high-bandwidth pins to an external analyzer, which allows reconstructing the full history of instruction execution in real time.

Energy efficiency and physical limitations

Energy efficiency is achieved by reducing dynamic power through minimizing the number of transistors in the simple instruction decoder and applying clock gating to unused functional blocks, but the limitation is the increase in static leakage and critical path delays when attempting to raise the frequency above the limit dictated by silicon physics at ultra-narrow process nodes.

History and development

Evolution began with the IBM 801 project, which laid down the principle of single-cycle pipeline execution without memory in instructions, then the MIPS standard embodied the concept of register windows and branch delays, and the ARM architecture spread RISC dominance in mobile and embedded systems through a flexible core licensing system and the addition of optional vector and cryptographic processing extensions.