MIPS64 (64-bit RISC architecture with fixed instruction length)

MIPS64 is an extension of the classic MIPS architecture to 64 bits, maintaining backward compatibility with 32-bit code. The processor uses a simplified instruction system for single-cycle operations, has 64-bit general-purpose registers and address space theoretically reaching 16 exabytes.

Initially MIPS64 found application in high-performance Silicon Graphics workstations and servers. Today the architecture dominates in high-complexity embedded systems: Cavium Octeon network routers, Sony PlayStation consoles, automotive multimedia complexes and industrial controllers. It is licensed for specialized chips where a balance between low power consumption and high computational bit width is required, especially in network packet processing equipment.

Typical problems

The main difficulty is incompatibility with the widespread x86-64 ecosystem, which limits porting of ready-made software and requires cross-compilation. Programmers encounter sign extension errors when porting 32-bit code. There is fragmentation due to multiple extension variants (MIPS64 Release 1-6), as well as proprietary additions that complicate unification. Branch delay slots and data alignment requirements complicate manual assembly code optimization.

How MIPS64 works

MIPS64 architecture is based on the classic five-stage RISC pipeline: fetch, decode, execute, memory access and writeback, with pipelining for simultaneous processing of up to five instructions. Unlike the 32-bit version, all general-purpose registers are extended to 64 bits, the arithmetic-logic unit operates on doublewords, and LD load and SD store instructions transfer 8 bytes per single memory access. Compared to ARM64, MIPS64 uses a more orthogonal three-operand instruction set, which simplifies compiler register allocation but requires more bits in instruction encoding. Unlike x86-64, where hardware dynamically translates CISC instructions into micro-operations with out-of-order execution, MIPS64 maintains strict in-order execution, relying on compiler optimization to fill delay slots. The instruction system includes LL/SC atomic operations for multiprocessor synchronization without complex bus locking. Virtual memory is served by a full TLB with software miss handling, which provides flexibility in page translation implementation but increases overhead compared to hardware page table walk in ARM. The CP0 coprocessor manages exceptions and cache, ensuring deterministic interrupt handling critical for real-time systems.

MIPS64 functionality

  1. Instruction set architecture and addressing modes. The base machine word length of MIPS64 is 64 bits, enabling direct operation on 64-bit integers and addresses. The processor maintains full backward compatibility with MIPS32 32-bit code at the user instruction level without recompilation in compatibility mode.
  2. MIPS32 (32-bit RISC architecture with fixed instruction length)
  3. 64-bit integer arithmetic operations. The bit width extension led to the introduction of doubleword instructions such as DADD, DADDIU and DSUB. They perform addition and subtraction on full 64-bit register contents, ignoring overflow in unsigned versions and triggering an exception in signed versions.
  4. Doubleword multiplication and division. DMULT, DMULTU, DDIV and DDIVU instructions place the result in special HI and LO registers. For 64-bit multiplication the 128-bit product is split between HI and LO, and for division the quotient is stored in LO and the remainder in HI.
  5. Doubleword load and store. LD and SD instructions perform atomic transfer of eight bytes between memory and a register. The address must be naturally aligned to a 64-bit boundary, otherwise the processor raises an address error exception, guaranteeing the integrity of indivisible access.
  6. Variable-length shift operations. DSLL, DSRL and DSRA instructions manipulate 64-bit operands, using the lower six bits of the source register to determine the shift amount. DSRA performs an arithmetic right shift, filling vacated positions with the value of the most significant bit.
  7. Bitwise logical operations on doublewords. Standard AND, OR, XOR and NOR instructions automatically apply to all 64 bits of the registers. This allows efficient masking of upper address bits and operating on status flags without additional extension instructions.
  8. Data movement between registers. The DMFC0 instruction copies the contents of a 64-bit CP0 system coprocessor register to the main register file, and DMTC0 performs the reverse operation. This is critically important for saving and restoring the full virtual address context during exception handling.
  9. Virtual memory system and TLB. The memory management unit operates on 64-bit virtual addresses, translating them to 36- or 64-bit physical addresses via a translation lookaside buffer. The TLB entry format includes a variable page size mask, supporting a range from 4 KB to 256 MB.
  10. Processor configuration registers. The Config CP0 register contains bits indicating the architectural revision, physical addressing width and availability of 64-bit modes. The AT flag in the Status register determines whether the processor operates in 64-bit user mode or in MIPS32 compatibility mode.
  11. Exception management in 64-bit mode. Exception vector addresses can be located in the XKPHYS area of virtual space, using status bits to select cacheability. The Cause register contains the exception code, and the EPC register stores the full 64-bit address of the interrupted instruction for resuming execution.
  12. Alignment error prevention. LDL and LDR, as well as SDL and SDR instructions, allow loading and storing unaligned doublewords through a sequence of two operations. The microprocessor merges the specified parts of the doubleword without triggering an exception, maintaining atomicity on the data bus.
  13. Atomic memory synchronization. The LL and SC instruction pair is extended to LL.D and SC.D for 64-bit width. LL.D loads a doubleword and sets a lock to monitor writes to the cache line. SC.D performs a conditional write, clearing the success flag if the atomicity of exclusive access is violated.
  14. Indirect addressing and far branches. JALR and JR instructions use a full 64-bit register to form the target branch address, which eliminates limitations of segmented memory organization. This allows subroutine calls to any point in a single flat virtual address space.
  15. Floating-point coprocessor FPU. The architecture defines 64-bit FPR registers for storing double-precision numbers according to the IEEE 754 standard. LDC1 and SDC1 instructions load and unload 8-byte values, and arithmetic operations ADD.D, SUB.D and MUL.D process them directly.
  16. Data format conversions. The DMFC1 instruction moves the bit pattern from a 64-bit FPU register to a main integer register without conversion. The reverse DMTC1 operation is used for manual formation of floating-point numbers, for example, when implementing software emulation of special constants.
  17. Kernel and privileged software addressing. The processor introduces XKPHYS and CKSEG0 virtual segments for direct kernel access to physical memory without TLB translation. XKPHYS uses upper address bits to select coherence and caching attributes when working with a 64-bit bus.
  18. Word-level sign operation support. LWU and LLD load instructions load 32-bit values, performing unsigned zero extension to a full 64-bit register. This prevents unintended sign extension, which often occurs when porting 32-bit code to MIPS64 address calculations.
  19. Comparison and result setting. SLT, SLTU, SLTI and SLTIU instructions correctly operate in a 64-bit context, setting the least significant bit of the result based on comparison of the entire doubleword. The absence of partial write instructions simplifies the pipeline and eliminates register file update delays.
  20. Performance counters and tracing. Performance monitoring registers in CP0 in MIPS64 are 64 bits wide to prevent rapid overflow. The integrated instruction trace block can generate records with full 64-bit branch addresses and timestamps for debugging complex multithreaded real-time systems.

Comparisons

  • MIPS64 vs SPARC V9. Both architectures are representatives of classic RISC design with fixed instruction length, however their approaches to 64-bit implementation are different. SPARC V9 extends the register file and introduces a complex register window mechanism optimized for commercial workloads. MIPS64, in contrast, maintains a flat register model, sacrificing hardware complexity for pipeline predictability and minimal delays, which makes it more attractive for embedded systems and network equipment.
  • SPARC (Open standard RISC architecture)
  • MIPS64 vs ARMv8-A (AArch64). The comparison reveals a fundamental ideological gap between the orthodox RISC of the genres formative period and a modern energy-efficient architecture. MIPS64 relies on a classic five-stage pipeline and a simple decoder, subject to the use of branch delay slots. AArch64 eliminated delay slots, introduced conditional execution and an enlarged register file, shifting the emphasis from MIPS64 hardware simplicity to advanced superscalar decoding logic and aggressive power reduction in mobile scenarios.
  • AArch64 (64-bit processor architecture with fixed instruction length)
  • MIPS64 vs RISC-V RV64I. Both instruction sets inherit the principles of Mead and Patterson, however MIPS64 represents a mature commercially motivated implementation, while RV64I is its ideological successor, free from accumulated architectural compromises. The key difference lies in conditional branch handling: MIPS64 employs a delay slot requiring careful compilation. RISC-V RV64I deliberately abandoned the delay slot, simplifying the design of multicore systems and branch prediction at the cost of a slight complication of the simplest pipeline.
  • RISC-V (Open modular instruction set architecture)
  • MIPS64 vs DEC Alpha. Both architectures anticipated the universal transition to 64 bits, but chose opposite paths in balancing performance and complexity. DEC Alpha was designed with an ideology of speed at any cost, avoiding even integer division operations in favor of software emulation at ultra-high clock frequencies. MIPS64 offers a more pragmatic compromise, maintaining a full set of integer instructions and guaranteeing compatibility with 32-bit code, which simplified application migration but required more thorough coordination between hardware and compiler.
  • MIPS64 vs Intel EM64T (x86-64). The comparison most clearly illustrates the conflict between CISC legacy and pure 64-bit RISC. MIPS64 operates exclusively in 64-bit mode with a uniform instruction set and transparent memory addressing without segmentation. EM64T is based on an extension of the IA-32 architecture and contains complex mechanisms for micro-operation translation in the decoder. The advantage of MIPS64 remains a significantly less power-hungry and compact decoder, while EM64T wins thanks to its ecosystem and ability to dynamically switch between compatibility modes with legacy code.
  • IA-32 (Provides execution of 32-bit computations)

OS and driver support

The MIPS64 architecture implements operating system support through a privileged Kernel Mode, in which software gains access to Coprocessor 0 (CP0) control registers responsible for exception handling, virtual address translation via TLB (Translation Lookaside Buffer) and cache configuration, while device drivers use mapping of physical I/O addresses to the virtual address space of the kseg1 segment (uncached and bypassing TLB) for direct access to peripheral registers without the risk of data coherence issues.

Security and process isolation

Security in MIPS64 is based on a four-level privilege model implemented by the KSU status bits in the Status register and the value of the EPC register, where user mode (User) is isolated from the kernel by prohibiting the execution of privileged instructions (for example, MTC0 or ERET) through generation of a Reserved Instruction exception, and memory protection between processes is ensured by pairwise comparison of the virtual address with boundaries stored in Base and Bound registers, or via the operating system reprogramming the TLB on every context switch to prevent unauthorized access to other physical memory pages.

Logging

The logging system in MIPS64 is implemented via the built-in PDTrace (Processor Debug Trace) block, which generates a compact stream of packets recording branch addresses, exception causes and incremental cycle counters, transmitting this data via a high-speed Trace Interface (TIF) to an external analyzer without stopping the core, while data watchpoints are configured through WatchLo and WatchHi registers, triggering a Watch exception when a load or store address matches the programmed value.

Technical limitations

Among the fundamental limitations of MIPS64 is the fixed branch delay slot of the instruction following a branch, which always executes regardless of the comparison result, shifting the optimization burden of filling this slot with useful work onto the compiler, as well as the absence of hardware support for speculative indirect memory access, forcing processor developers to introduce lightweight synchronization barriers (SYNC) to ensure correct operation with DMA buffers without violating the weak memory consistency model.

History, evolution and current status

The development of MIPS64 began with the announcement of the MIPS III architecture by MIPS Computer Systems, which first introduced 64-bit integer registers and a flat doubleword address space, then the Release 6 specification redefined the machine code, eliminating delay slots and reworking floating-point arithmetic to the IEEE 754-2008 standard, and after the transfer of rights to RISC-V International and the cessation of licensing of new cores by Wave Computing, the architecture is preserved in the form of verified processor cores for embedded systems requiring deterministic interrupt response with minimal die area.