MIPS (Simplified pipelined RISC architecture without interlocks)

MIPS is a processor architecture where all commands are executed strictly in order over an identical number of cycles. It has no hardware circuits that check data readiness from a previous instruction. The responsibility for correct pipeline filling is entirely shifted onto the compiler, which makes the processor simpler, cooler, and faster in frequency.

Historically MIPS dominated Silicon Graphics workstations and DEC servers, and today it is licensed for embedded systems. Thanks to determinism and low power consumption, MIPS-based cores are found in MikroTik routers, home gateways, automotive media systems, as well as in cheap Chinese tablets and game consoles like the PlayStation Portable. It is also often chosen as a training ground for studying pure RISC design without the incidental complexities of x86.

The main problem is data hazards, when the next instruction tries to use the result of a previous one that has not yet finished. Since there are no hardware interlocks, the programmer or compiler is forced to insert a bubble (NOP) or rearrange the code. The problem is aggravated by the branch delay slot: the instruction immediately following a conditional branch is executed in any case, which breaks intuitive logic and creates hard-to-catch errors during manual assembly coding. Incorrect prediction of this slot leads to hard-to-debug visual artifacts in graphics.

How MIPS works

The classic five-stage MIPS pipeline (IF — instruction fetch, ID — instruction decode, EX — execute, MEM — memory access, WB — write back) is tailored to the principle of one instruction per cycle. Unlike later superscalar architectures (ARM Cortex-A or Intel Core), which hardware-reorder the instruction stream inside the core (Out-of-Order Execution) and spend transistors on a complex scheduler, MIPS rigidly adheres to bare minimalism (In-Order). Instead of microcode in CISC systems, the MIPS control unit generates signals directly via fixed logic. The secret of efficient operation without interlocked stages is that the architecture bets on software protection: the compiler analyzes the code in advance and spaces out dependent instructions (static scheduling). For example, for a load from memory (LW), which yields a result only at the MEM stage, an arithmetic operation using that data is simply not placed immediately after without a gap. Hardware forwarding (result bypassing) is implemented, but minimally: data from the MEM stage can go directly into the EX stage of the next command, bypassing the register file, reducing the wait from two cycles to one; however, the processor still will not insert a full stall — that is the code’s task.

MIPS functionality

  1. Pipeline architecture with phase separation. The MIPS microprocessor implements the classic five-stage instruction pipeline. The stages include fetch (IF), decode (ID), execute (EX), memory access (MEM), and write back (WB). This decomposition allows up to five instructions to be processed simultaneously, theoretically achieving a throughput of one operation per cycle.
  2. Interlock elimination unit. Contrary to the historical meaning of the acronym, modern MIPS implementations contain complex dependency detection logic. The hardware interlock analyzes RAW (Read After Write) register conflicts and stalls the pipeline until the collision is resolved, inserting bubbles into the ID-stage slots.
  3. Direct data bypass mechanism. Forwarding is implemented via multiplexers at the ALU inputs. The result from the EX/MEM or MEM/WB output is sent back to the EX-stage input before the actual write to the register file. This minimizes data dependency penalties, eliminating wait cycles for most arithmetic chains.
  4. Branch delay slot processing. The architecture uses one delay slot after branch instructions. The compiler or assembly programmer is obliged to place an independent instruction in this slot that executes before the control flow changes. The hardware logic forcibly executes this slot, not canceling it on a conditional branch.
  5. Static branch prediction. Early MIPS versions predicted conditional branches as not taken. The pipeline continues linear instruction fetching. If the branch is confirmed, the erroneously loaded instructions are annulled, resulting in a one-cycle loss. Cancellation is performed by resetting control signals at the IF/ID stages.
  6. Register file with multiported access. The integer register block has two read ports and one write port. Reading occurs at the decode stage, and writing strictly at the WB stage on the rising edge of the clock signal. This allows writing and reading of the same register in one cycle without data races.
  7. Arithmetic logic unit with preshift. The ALU performs bit shifts and logical operations directly in the execution path. The operand is extracted either from registers or through the forwarding chain. The block supports sign extension for immediate operands loaded from the imm field of an I-type instruction.
  8. HI/LO multiply and divide module. Isolated HI and LO registers serve integer multiplication and division operations. The mult and div instructions initiate a multi-cycle operation without blocking the pipeline. The result is read by the mfhi and mflo instructions. If a read is attempted before calculation completion, the hardware interlock stalls the pipeline.
  9. CP0 system coprocessor. The control coprocessor encapsulates exception handling and virtual memory. CP0 registers store the interrupt handling vector, exception mask, fault cause identifier, and the erroneous instruction address (EPC). Access to CP0 is performed by the privileged mfc0 and mtc0 instructions strictly in kernel mode.
  10. Virtual address translation via TLB. The translation lookaside buffer converts virtual addresses to physical ones at the fetch and memory access stages. The hardware TLB refill state machine automatically loads missing entries from the page table in memory without generating exceptions, ensuring low latency.
  11. Deterministic exception model. Precise interrupts guarantee that all instructions before the faulting one are completed, and subsequent ones have no side effects. When an exception occurs in the EX or MEM stage, the pipeline is flushed, the return address is saved in EPC, and control is atomically transferred to the exception handling vector.
  12. Atomic synchronization instructions. The Load Linked (LL) and Store Conditional (SC) instruction pair implements lock-free algorithm primitives. LL loads a word and marks the cache line, and SC checks for modifications. If the line has been modified, SC clears the target register to zero, signaling a write failure without bus locking.
  13. Unsupported instructions and traps. Encodings that do not correspond to valid operations cause a Reserved Instruction exception. The trap mechanism allows hardware-absent functions to be emulated via software handlers. This is used for floating-point without a coprocessor or for emulating unaligned memory access.
  14. Cache memory configuration. The Harvard architecture with separate instruction cache (I-Cache) and data cache (D-Cache) eliminates structural conflicts between the IF and MEM stages. The cache miss logic initiates a pipeline streaming buffer to fill the line, starting with the critical word for immediate execution resumption.
  15. Direct segmented access mode. The standard MIPS user environment (kuseg) uses kseg0 and kseg1 segments with fixed mapping. kseg0 (cached) and kseg1 (uncached) addresses are translated directly by subtracting a base offset, bypassing the TLB. This is critically important for low-level initialization and exception handlers.
  16. Memory access ordering control. The SYNC instruction creates a memory barrier, guaranteeing the completion of all preceding load and store operations before subsequent ones begin. This mechanism is necessary in multiprocessor configurations to ensure coherency, preventing speculative reordering of accesses by the compiler and the core.
  17. WAIT power-saving mode. The WAIT instruction stops core clocking until a hardware interrupt occurs. The pipeline is halted, reducing dynamic power consumption. Upon exiting the wait mode, exception processing begins with minimal delay, as the pipeline state is preserved without a flush.
  18. Interrupt vector table in Vectored Interrupt mode. Advanced MIPS versions implement a vectored EIC (External Interrupt Controller) mode. The hardware delivers the interrupt number directly into the Cause register, allowing the processor to jump immediately to a specific handler, bypassing the peripheral status polling stage in a common dispatcher.
  19. Barrel shifter scheme. The hardware shifter is integrated into the execution path and is capable of performing arithmetic and logical shifts by an arbitrary number of bits in one cycle. It is controlled by the shamt field of R-type instructions or the low-order bits of the source register for variable shift instructions.
  20. Unaligned address support. Despite the requirement for word-aligned data, special LWL/LWR and SWL/SWR instructions allow reading and writing of arbitrarily aligned words. The instruction pair captures adjacent aligned containers, and the hardware shifter combines the bytes into the destination register.
  21. Countdown timer mechanism. The Count/Compare module in CP0 contains a counter register that increments at half the core frequency. When the counter value matches the Compare register, an interrupt is generated. This provides a deterministic task scheduler in real-time systems without external timers.

Comparisons

  • MIPS vs ARM (RISC architecture). Comparing MIPS and ARM reveals a difference in the philosophy of conditional branch handling. The MIPS architecture relies on a branch delay slot, requiring the compiler to fill the instruction after a branch, whereas ARM implements conditional execution of instructions, avoiding pipeline flushes without extra code and preserving instruction density.
  • MIPS vs x86 (CISC architecture). Comparing MIPS with the x86 architecture demonstrates the conflict between simplicity and decoding complexity. The fixed 32-bit MIPS instruction length provides trivial predictable decoding, unlike the variable instruction length of x86, which significantly simplifies the pipeline microarchitecture at the cost of potentially lower code density for MIPS.
  • MIPS vs RISC-V (open standard). Comparing MIPS and RISC-V, one must note the evolution in delay slot handling. MIPS architecturally includes a mandatory delay slot after branches, a microarchitectural detail that complicates superscalar implementations, which modern RISC-V has completely abandoned to simplify multi-issue decoding and increase out-of-order execution efficiency.
  • RISC-V (Open modular instruction set architecture)
  • MIPS vs SPARC (scalable architecture). The fundamental divergence between MIPS and SPARC lies in the register file model. MIPS uses a simple linear array of thirty-two registers, sufficient for pipeline efficiency, while SPARC implements the register window concept, automatically switching context on procedure calls and reducing stack accesses but increasing the hardware complexity of the chip.
  • SPARC (Open standard RISC architecture)
  • MIPS vs PowerPC (performance architecture). The difference between MIPS and PowerPC manifests in exception handling and Load/Store instruction semantics. MIPS permits an architecturally visible side effect in the destination register after a delayed load, requiring caution from the programmer or compiler, whereas PowerPC hardware-stalls the pipeline to ensure transparent interrupt precision without intervention in the user programming model.
  • PowerPC (RISC architecture with computation optimization)

OS and driver support

The MIPS architecture provides a standardized set of privileged resources, including the Status register for managing access levels (User, Supervisor, Kernel), the Cause register for exception identification, and an integrated memory management unit (MMU) with a software-managed TLB, allowing the OS kernel to implement virtual memory and process isolation through TLB Refill exception handling; device drivers interact with peripherals via memory-mapped I/O registers (MMIO), using load/store instructions to physical addresses in the unprivileged kseg1 segment, which eliminates the need for special I/O instructions and unifies the software access model.

Security and execution integrity

Isolation of user processes is ensured by dividing the virtual address space (4 GB, split into the user kuseg area and the kernel kseg0/kseg1 areas) with a hardware access bit check on every address translation, while protection against buffer overflow attacks can be implemented at the OS level by setting the Execute Inhibit (XI) bit in TLB entries, preventing code execution in the stack and heap; kernel integrity is maintained by a strict exception hierarchy, where non-maskable interrupts and bus errors are handled atomically with automatic context saving in the dedicated EPC and BadVAddr registers without the risk of state loss.

Logging and tracing system

Hardware debug and logging support is implemented via the EJTAG (Enhanced JTAG) interface, which provides non-intrusive instruction stream monitoring through a real-time trace mechanism issuing packets via the Nexus protocol with timestamps; the Debug and Trace Control registers allow filtering events by address ranges and instruction types (branches, function calls); for software exception logging, the OS kernel reads the error context from the Cause and EPC registers directly in the handler, writing structured information into a kernel ring buffer with subsequent output via a serial port or network interface.

Limitations

The pipeline with open publication of all stages imposes a restriction on register renaming and speculative execution in classic implementations, leading to cycle losses on data dependencies (load delay slot), while the one-dimensional interrupt model with a single exception vector (usually at address 0x80000180) requires software dispatching of causes by parsing the Cause register, introducing a deterministic but potentially significant delay before entering a specific handler; also, vulnerability to side-channel attacks (Spectre/Meltdown) is theoretically lower due to the absence of a deep speculative buffer, but modernized versions with out-of-order execution require the insertion of synchronization barrier instructions (SYNC) to guarantee memory ordering.

History, evolution, and standardization

Developed at Stanford University under the leadership of John Hennessy in the early 1980s as a didactic RISC processor project, the MIPS architecture evolved through commercialization by MIPS Computer Systems into the R2000/R3000 families with software-managed TLB, then into 64-bit R4000 processors with a deepened pipeline and symmetric multiprocessor cache coherency; after the company transitioned to Imagination Technologies, development split into high-performance Warrior cores with hardware virtualization (MIPS VZ) and microcontroller-oriented microMIPS profiles with mixed 16/32-bit instruction encoding; thereafter the architecture was transferred to the management of MIPS Tech LLC, and later moved toward an open RISC-V model for free licensing through the MIPS Open initiative.