Thumb (ARM Compressed Instruction Set) is an ARM processor mode where standard 32-bit commands are packed into a compact 16-bit format. This approach nearly halves the code size, which is critically important for memory-constrained devices, though at the cost of some performance loss in computational tasks.
The Thumb instruction set is widely used in microcontrollers (Cortex-M series) and real-time embedded systems, where code density matters more than peak computational speed. The technology is in demand in consumer electronics, automotive control units, and wearable devices. In more powerful processors (Cortex-A), a hybrid variant called Thumb-2 is used, combining 16- and 32-bit instructions, allowing the operating system and applications to dynamically balance between performance and memory savings.
The main limitation of classic Thumb is the inability to perform all operations due to the reduced opcode set. Instructions have access only to the lower half of the register file (R0–R7), and direct memory addressing and condition flag handling are significantly cut down. There is no built-in support for coprocessors and certain arithmetic commands. Context switching between ARM and Thumb modes is often required via the special BLX instruction, adding overhead to library function calls.
How Thumb works
In Thumb mode, the processor performs dynamic instruction decompression on the fly: the decoder at the pipeline input converts 16-bit codes into equivalent 32-bit operations of the internal representation. By reducing instruction length, the load on the memory bus is relieved — two instructions are fetched in a single read cycle instead of one, which is especially effective when working with slow flash memory or a narrow 16-bit data bus.
For comparison, classic ARM mode always operates with full-size 32-bit instructions, where every mnemonic can be conditional and flexibly manipulate any registers. Thumb sacrifices this orthogonality for compactness: high registers are accessible only through special move commands, and predicates (conditional execution) are unavailable for most instructions. A direct competitor to the technology is the RISC-V architecture with its optional Compressed Extension, which, unlike Thumb, does not require global mode switching — 16- and 32-bit commands mix freely in a single stream without transition instructions. However, the implementation of Thumb-2 in ARM removed this limitation, allowing instruction formats to mix in a single code segment, providing compression benefits without losing the flexibility of a full instruction set.
Thumb functionality
- THUMB.1 instruction encoding. The THUMB.1 instruction format uses a 16-bit codeword, where the upper five bits are fixed to
01000, and the remaining eleven bits define the source register and destination register. - Assembler syntax. The instruction is written as
THUMB {cond}, where the condition field is optional and follows the standard conditional execution rules of the ARM architecture insideITblocks. - Destination register field Rd. The 3-bit
Rdfield in the encoding directly addresses a general-purpose registerR0-R7, which receives the transformation result computed by the context switch hardware block. - Source register field Rm. The 3-bit
Rmfield specifies the input operand from the low register bank, whose value is used to generate a pointer to the alternative processor state. - PSTATE operation logic. When executing the instruction, the core analyzes the current
PSTATE.Tmode bit and, depending on it, toggles the bit while simultaneously modifying the program counter to branch to the correct address. - T bit toggling. The hardware inverts the
PSTATE.Tbit, immediately changing the width of decoded instructions from 16-bit to 32-bit or vice versa without intermediate wait cycles. - Target address generation. The branch address is computed by adding the value of
Rmto the current aligned program counter value, with the least significant bit of the result forcibly cleared for halfword alignment. - Branch address alignment. The hardware masks the zero bit of the final address, ensuring strict halfword boundary alignment, which is critically important for correctly fetching the first instruction of the new set.
- Return address saving. The address of the instruction following
THUMBis automatically written to theLRregister, with the least significant bit set according to the newPSTATE.Tstate for correct future return. - Program counter update. The
PCregister is forcibly loaded with the computed aligned target address, immediately initiating a fetch along the new pipeline path without flushing the prefetch buffer. - Unconditional nature of the operation. The
THUMBinstruction ignores any current condition flags obtained from ALU operations, always performing the branch and mode change as an atomic switch operation. - Interaction with IT blocks. If
THUMBis placed inside anITinstruction block, it must always be the last instruction of the block, since its execution causes an irreversible state change of the decoder. - Behavior in Thumb mode. When called from the Thumb state (
T=1), the instruction switches the core to the ARM state, using the value ofRmas an absolute pointer to the first 32-bit handler instruction. - Behavior in ARM mode. When called from the ARM state (
T=0), the core transitions to Thumb, interpretingRmas the base address for Thumb code; the least significant bit ofRmis ignored during address calculation but affects theTbit when written toLR. - Pipeline synchronization. Executing
THUMBcauses a forced flush of the prefetch and decode stages of the pipeline to prevent any attempt to interpret previously cached instructions as code of the opposite length. - Use in vector tables. The instruction is critically used in low-level exception handlers, where the reset vector address is loaded into
Rm, providing immediate entry into compact Thumb code from the ARM exception state. - Branch atomicity. The processor state switch and the new address write to
PCare performed as an indivisible operation, guaranteeing that an asynchronous interrupt will not occur at a moment of undefined decoder state. - Relationship to BX Rm. Functionally, the
THUMBinstruction is a deprecated but direct alias of theBX Rminstruction with the same encoding, differing only in the mnemonic to explicitly indicate the intent to change the instruction set. - Zero offset behavior. If a zero value is specified in
Rm, a mode switch will occur with control transferred to the currentPC, which can be used for a forced state change without a meaningful branch. - Register bank limitations. The instruction can address only registers
R0-R7for the THUMB.1 version, imposing restrictions on the use of high registers without prior data movement via the stack.
Comparisons
- Thumb vs RISC-V Compressed (RVC). Both extensions pursue a single goal — increasing code density through 16-bit instructions. Thumb is an integral part of the ARM specification and requires processor mode switching, whereas RVC is transparently integrated into the base RISC-V ISA without context change, allowing free mixing of 16- and 32-bit instructions in a single stream.
- Thumb vs MIPS16. MIPS16, like Thumb, implements an alternative compressed instruction space requiring explicit entry and exit via special switch instructions (
JALXin MIPS). The fundamental difference is that Thumb code is a full-fledged compilation target language, whereas MIPS16 often requires a standard MIPS runtime for handling exceptions and complex arithmetic. - MIPS (Simplified pipelined RISC architecture without interlocks)
- Thumb vs ARM Thumb-2. Thumb was a strict 16-bit set with limited register access and a narrow range of conditional execution. With the advent of Thumb-2, the technology evolved into a mixed 32/16-bit architecture with a unified decoder, eliminating state switching overhead and restoring full access to the register file and condition flags while retaining code compactness.
- Thumb vs SuperH (SH-2). The SuperH architecture is also oriented toward a fixed 16-bit instruction length to achieve high code density; however, unlike Thumb, it was originally designed as a self-sufficient system without a backup 32-bit mode. This gave SH an advantage in fetch predictability, but Thumb benefited from the ability to return to full ARM instruction performance for resource-intensive computations.
- Thumb vs MicroMIPS. Both technologies implement the concept of compressed code by recoding 32-bit operations into a 16-bit format while maintaining assembler-level compatibility. The fundamental difference lies in the encoding method: Thumb uses dedicated branch instructions for changing the interpretation context, whereas MicroMIPS uses unused upper bits of the instruction itself to indicate its width, eliminating global core mode switching.
OS and driver support
Thumb implementation in an operating system context requires explicit kernel interaction with the processor mode: in ARM9TDMI and later cores, switching between the ARM (32-bit) and Thumb (16-bit) instruction sets is performed by setting the least significant bit (bit 0) of the program counter in a BX or BLX instruction, rather than by directly changing the T state bit in the CPSR register, which guarantees atomic context switching; drivers written for Thumb are called via interrupt vector tables, where each handler is placed in memory with 2-byte alignment, and the OS dispatcher masks the low address lines when calculating the offset to preserve the Thumb mode flag in the return address; the supervisor stack (SVC) and exception handling in the Thumb state automatically save and restore registers in conjunction with the T flag, so that upon exiting an interrupt, the processor correctly returns to the compressed code without additional checks by the OS.
Security
Security mechanisms in Thumb are based on hardware access rights partitioning via domains and MPU/MMU, which operate identically for 16-bit and 32-bit instructions: when the processor executes Thumb code, the memory management unit continues to apply the same read/write/execute permission checks as for ARM instructions, and any attempts to branch to a non-2-byte-aligned address in the Thumb state cause an undefined instruction exception, preventing buffer overflow attacks aimed at shifting the decoding; additionally, TrustZone technology for ARMv7-M implements secure calls of Thumb functions from the non-secure world via the SG (Secure Gateway) instruction, which is placed as the first instruction at the entry point and verifies the legitimacy of the transition, blocking unauthorized access to protected resources at the decode cycle stage before the execution of the actual useful code.
Logging
Embedding logging into the compressed instruction set is performed through special trace points implemented by Embedded Trace Macrocell (ETM) hardware modules, which operate in Thumb compatibility mode: when tracing is enabled, the processor generates compressed branch address packets, where each direct or indirect Thumb branch is marked with a 16-bit code execution flag, and the debugger or traffic analyzer reconstructs the full instruction history by post-processing the data stream and aligning addresses to multiples of two; for software logging, the compiler, when profiling options are specified, automatically inserts a PUSH {LR} sequence and a trace hook call via a 16-bit BL instruction into the Thumb function prologue, and due to the limited branch range of ±4 MB, the linker groups Thumb log handlers into a special memory section accessible through a long ARM code trampoline to avoid branch offset overflow and prevent audit data loss.
Limitations
The main architectural limitations of Thumb stem from the reduced codeword length, resulting in the inability to conditionally execute most instructions (except branches) and requiring explicit predication via the short IT instruction in the Thumb-2 architecture; in classic Thumb, there is no direct access to the high range registers R8–R12 in arithmetic-logic operations, except for specialized forms like MOV, CMP, and ADD, forcing the compiler to generate additional instructions to transfer values via the stack or low registers; also, 16-bit load/store instructions have a 5-bit offset field, allowing addressing of only the first 32 words relative to the base register, and immediate constants are limited to the 0–255 range with a possible shift, slowing down the formation of large numbers and requiring literal tables to be placed within the first 1 KB of the current PC-relative load instruction, imposing strict requirements on the layout of critical code sections in tight memory areas.
History and development
Thumb development began in the early 1990s as ARM Ltd.’s response to the problem of low code density in ROM-limited devices, and the first commercial release occurred in the ARMv4T architecture with the ARM7TDMI processor in 1995, where decompression of Thumb instructions into a 32-bit ARM equivalent happened at the pipeline decode stage without additional cycles, maintaining performance while losing only about 30% on arithmetic-intensive operations; the evolution of the set led to the emergence of Thumb-2 in the ARMv7 architecture (2004-2007), introduced with the Cortex-A8 processor, where mixed encoding technology allowed 16-bit and the first 32-bit instructions to be combined into a single continuous stream without mode switching, adding an extended set for interrupt handling and floating-point operations; further development in ARMv8-M for microcontrollers completely eliminated the ARM set, leaving only a subset of Thumb with hardware DSP acceleration and built-in security extensions, making compressed code the primary and sole execution interface in energy-efficient profiles, cementing a design methodology where the T state bit is always set and requires no software transitions.