MIPS32 (32-bit RISC architecture with fixed instruction length)

MIPS32 is a classic example of a RISC processor, where all machine instructions have the same length of 32 bits. The architecture uses a large register set and simple memory addressing schemes, which allows the processor pipeline to operate extremely efficiently without complex decoding circuits.

MIPS32 is ubiquitous in embedded electronics, routers, cameras, and household appliances. Thanks to the licensable intellectual property model, the architecture core is integrated into microcontrollers by companies like Microchip Technology and MediaTek. Historically, MIPS-based processors powered the Nintendo 64, Sony PlayStation, and PlayStation Portable game consoles, and today they are actively used in automotive driver assistance systems.

Engineers often face the consequences of archaic design decisions: classic MIPS32 features branch delay slots — the instruction after a branch executes unconditionally, which complicates manual assembly programming. A significant issue is dependence on the licensor and competition with ARM, leading to a shortage of modern development tools and compilers. Also, the limitations of the 32-bit address space constrain the architecture’s use in tasks requiring more than four gigabytes of RAM.

How MIPS32 works

The operating principle is based on strict separation of operations: instruction fetch, decode, execution by the arithmetic logic unit, memory access, and write-back are distributed across the stages of a classic five-stage pipeline. Unlike x86-64, where instruction length is variable, the fixed instruction size of MIPS32 enables faster and more predictable decoding. Compared to ARMv7, the MIPS architecture historically relied on instruction set orthogonality and a larger number of general-purpose registers, which reduces the frequency of RAM accesses and simplifies code optimization by the compiler. The pipeline supports data forwarding: the result of an arithmetic operation can be passed to the input of the next instruction even before being actually written to the register file. Memory coherence and exception handling are implemented through coprocessor CP0, which is responsible for translating virtual addresses to physical addresses via the Translation Lookaside Buffer and manages system interrupts without using micro-operations typical of CISC architectures.

MIPS32 functionality

  1. Register File and Data Model. The MIPS32 processor core contains 32 general-purpose registers, each 32 bits wide. Register $0 is hardwired to the value zero. The programming model defines the mapping of words in memory as either little-endian or big-endian. Instruction operands are registers and 16-bit immediate values, sign-extended or zero-extended.
  2. Instruction Types and Encoding Formats. Machine code uses three main formats: R-type for register operations, I-type for operations with an immediate operand, and J-type for jumps. The length of each instruction is strictly fixed at 32 bits. The opcode field determines the operation class, while additional funct and sa fields specify its details.
  3. Integer Arithmetic Operations. The ADD and ADDU instructions perform signed and unsigned addition, respectively. Subtraction is implemented by the SUB and SUBU instructions. In signed operations, overflow triggers an exception. The MULT instruction forms a 64-bit product in the HI and LO registers, from which data is retrieved using the MFHI and MFLO instructions.
  4. Logical and Bit Manipulation. The architecture provides a standard set of bitwise operations: AND, OR, XOR, and NOR. Shifts are implemented by the SLL, SRL, and SRA instructions. Logical shifts fill vacated positions with zeros, whereas the arithmetic shift SRA duplicates the sign bit, ensuring correct division of signed numbers by powers of two.
  5. Comparison and Conditional Branching. The SLT and SLTU instructions perform a less-than comparison, writing the logical result to the destination register. The BEQ and BNE instructions perform branching based on equality or inequality of two registers. Branching based on comparison with zero is implemented via BGEZ, BGTZ, BLEZ, and BLTZ.
  6. Unconditional Jumps and Addressing. The J instruction performs a direct jump within a 256-megabyte segment. The JAL instruction additionally saves the return address in register $31. The JR indirect jump uses a register value as the target address. JALR combines indirect addressing with saving the return address in an arbitrary register.
  7. Memory Operations and Addressing Modes. Data loading is performed by the LB, LH, LW instructions with sign or zero extension. Storing is implemented by the SB, SH, SW instructions. The only addressing mode is base-plus-offset. The effective address is calculated by summing the contents of the base register and a 16-bit immediate offset.
  8. Constant Loading and Masking. Loading a 16-bit immediate value into the upper half of a register is performed by the LUI instruction. The lower bits are cleared in the process. Combining LUI with a subsequent ORI allows forming a full 32-bit constant in two cycles. The LUI instruction is critically important for calculating addresses of external ports and tables.
  9. Exceptions and Interrupt Model. The processor implements a precise interrupt model. When an exception occurs, the address of the faulting instruction is saved in the EPC register. The exception cause is captured in the ExcCode field of the Cause register. The core switches to kernel mode and transfers control to the fixed common exception vector at address 0x80000180.
  10. Privileged Kernel Mode. The status register defines the current privilege level and interrupt masks. Switching to user mode prohibits the execution of privileged instructions. Attempting to execute them causes a Reserved Instruction exception. The memory protection mechanism is usually implemented through a memory management coprocessor, which is not part of the base core.
  11. Coprocessor CP0 and System Functions. The specialized MTC0 and MFC0 instructions provide read and write access to the system coprocessor registers. Exception handling, cache management, and address translation are configured through CP0. The ERET instruction performs an atomic return from an interrupt handler, restoring the interrupt state and the program counter.
  12. Multiply-Accumulate. The MIPS32 DSP extension includes the MADD instruction, which multiplies two 32-bit operands and adds the 64-bit product to the accumulator. The MADDU instruction does the same without signed interpretation. These operations accelerate signal convolution and dot product computation without explicit data extraction from HI/LO.
  13. Power Management and Wait. The WAIT instruction transitions the pipeline into a low-power state until an interrupt occurs. The processor state is frozen, and the clock frequency may be reduced. Exiting the wait state occurs without data loss, and execution resumes from the instruction following WAIT, which is critical for embedded battery-powered devices.
  14. Memory Synchronization Barriers. The SYNC instruction establishes an ordering point for memory accesses. All loads and stores preceding the barrier are completed before any subsequent memory operations are executed. This guarantees data integrity during interaction between threads and coprocessors in multi-master systems without hardware coherence.
  15. Atomic Synchronization Operations. The LL and SC instruction pair implements non-blocking atomic updates. The LL instruction loads a word and reserves the address. SC stores a new value only if the reservation has not been broken by external interference. On success, a one is written to the destination register; otherwise, a zero.
  16. Conditional Data Movement. The MOVZ and MOVN instructions copy register contents depending on a third operand. MOVZ performs the transfer if the test register equals zero; MOVN does so if it does not. This avoids performance losses due to speculative branching and pipeline flushes in short conditional constructs.
  17. Hardware Multiplication Implementation. The multiplication unit forms the product in a minimal number of cycles, typically one cycle in synthesizable cores. The MUL, MUH, MULU, and MUHU instructions write the lower or upper half of the product directly to the register file, eliminating the stage of working with the HI/LO pair and shortening the critical path.
  18. Sign Bit Isolation. The specialized CLO instruction counts the number of leading ones in a register, while CLZ counts leading zeros. These instructions implement fast normalization of floating-point numbers and priority encoding. The result is placed in the destination register without software scanning loops.
  19. Byte Order Reversal. The WSBH, ROTR, and SEB instructions perform bit-width manipulations. WSBH swaps bytes within halfwords. SEB sign-extends an arbitrary byte to 32 bits. These operations accelerate the conversion of network and file formats without using lookup tables.
  20. Cache Memory Configuration. Cache subsystem synchronization is performed by the CACHE instruction, specifying the operation and line. Basic operation codes include invalidation and writeback to main memory. The Index Store Tag parameter allows initializing tag memory during the boot phase of the operating system or software simulator.
  21. Data Prefetching. The PREF instruction performs speculative prefetching of a cache line at a computed address. The usage hint indicates the intent to load data for reading or modification. Prefetching does not generate exceptions on an invalid address, allowing memory latencies to be hidden without the risk of program termination.

Comparisons

  • MIPS32 Load/Store vs x86 MOV. The MIPS32 architecture strictly adheres to the load-store model, where memory access is only possible through LW and SW instructions, and data processing is performed exclusively in registers. In contrast, the MOV instruction in x86 allows direct copying of data between memory and registers and performing arithmetic operations with a memory operand, which reduces orthogonality but shortens code length.
  • x86 (Execution of instructions based on CISC architecture)
  • MIPS32 Branch Delay Slot vs ARM Conditional Execution. In MIPS32, the instruction following a branch always executes due to the architectural delay slot, requiring manual or compiler code scheduling. The ARM architecture avoids this side effect through a predication system: every instruction can be conditionally executed based on status flags, completely eliminating the need for a delay slot and reducing pipeline bubbles.
  • MIPS32 Register Window vs SPARC Register Window. Standard MIPS32 uses a fixed flat register file of thirty-two registers, requiring explicit argument passing through the stack during function calls. The SPARC architecture employs a mechanism of overlapping register windows, which hardware-provides a new set of input and local registers upon a subroutine call, accelerating context switching but complicating the handling of deep call nesting.
  • SPARC (Open standard RISC architecture)
  • MIPS32 HI/LO vs RISC-V Multiply Extension. For integer multiplication and division, MIPS32 uses special accumulator registers HI and LO, retrieving the result from which requires additional MFHI and MFLO instructions. The modern RISC-V standard (M extension) implements atomic multiply instructions (MUL) that write the result directly to the target register, which is semantically simpler and does not require pre-clearing hidden architectural resources during debugging.
  • RISC-V (Open modular instruction set architecture)
  • MIPS32 Co-processor 0 vs ARM System Control Coprocessor. Exception handling and virtual memory management in MIPS32 are assigned to Coprocessor 0, requiring MFC0 and MTC0 instructions to read status fields and exception context. In the ARM architecture (CP15), system control is implemented through coprocessor operations with wider register fields and standardized context banks, making task switching atomic, unlike the step-by-step context saving of MIPS32.

OS and driver support

The MIPS32 architecture supports operating systems through a strict privilege hierarchy implemented by processor status bits: Kernel Mode for access to the entire address space and management of critical registers, and User Mode with an isolated memory area where direct hardware access is prohibited, which causes an exception when attempting to execute a privileged instruction. Interaction with drivers is carried out through exception handling, where the central processor jumps to a fixed vector in the Cause register to identify the interrupt source from an external controller, after which the kernel reads and writes memory-mapped I/O registers in the uncached kseg1 segment, using load and store instructions to control the device without specialized port commands.

Architecture-Level security

Protective mechanisms in MIPS32 are based on virtual memory access control through a page-based memory management unit, where translation entries contain validity bits and write permission flags, and any attempt to access a page with incorrect privileges or an unmapped address immediately generates a TLB Refill or TLB Invalid exception, preventing unauthorized access by processes to foreign data. To protect executable code, the architecture prohibits instruction execution in the data segment, implementing this principle through separate attributes in memory page entries, and the integrity of jump register addressing is maintained by mandatory instruction address alignment, which triggers a hardware Address Error on a misaligned jump, defeating many arbitrary code execution exploits.

Logging and debugging system events

The architectural logging process in MIPS32 is implemented through built-in support for breakpoints and tracing, where program execution flow control is managed by the Debug and WatchPoint registers, allowing hardware monitoring of a load or store at a specific virtual address without slowing down the main instruction pipeline. When the condition is triggered, the debugger gains control via a special Debug Exception, where state details are saved in a dedicated Debug Status register, and for branch history tracing, a compressed instruction stream signature in the PDtrace register is used, capturing the sequence of executed branches and target addresses for subsequent accurate reconstruction of the program execution path without introducing real-time delays.

Limitations

A fundamental limitation of MIPS32 in the modern context is the fixed bit-width of computations and addressing, which establishes a physical linear memory limit of four gigabytes through 32-bit general-purpose registers, and this boundary cannot be expanded without transitioning to the MIPS64 specification with its changed machine word and program counter size. Floating-point performance is constrained by dependence on an optional Coprocessor 1, which is often absent in embedded configurations to save transistors, causing a Reserved Instruction exception on every attempt to execute a math instruction, forcing the operating system to emulate computations in software with significant overhead. The pipeline architecture exhibits classic load data hazards, where an instruction using the result of a load instruction immediately after it forcibly stalls the pipeline for one cycle due to the inability to bypass the operand delivery delay from the data cache to the register file.

History and evolutionary development

Developed at Stanford University and commercialized by MIPS Computer Systems, the architecture evolved from basic R2000 implementations to the standardized MIPS32 version in response to the need to unify memory management and system interfaces, which were previously fragmented across dozens of proprietary assembly dialects, leaving the fundamental instruction set unchanged for binary compatibility with decades of legacy software. The abandonment of backward compatibility with obsolete branch delay slots and the introduction of an architecturally visible thread synchronization module with atomic LL and SC instructions marked the transition to practical parallelism at the level of multiprocessor coherent access to shared memory. Today, the MIPS32 Release 6 core continues to develop in utilitarian processors for network routers and microcontrollers, where synchronization cycles are precisely predictable, and code execution occurs without dynamic recompilation and speculative execution, making the architecture relevant for hard real-time systems with deterministic interrupt processing latency.