Matrix (Storing data in tabular form)

Matrix (Two-Dimensional Array of Numbers) is a rectangular grid of numbers organized into rows and columns. Each value has a double index: the first indicates the row, the second indicates the column. This approach allows efficient storage and processing of structured numerical data as a single whole.

Matrices are widely used in computer graphics for image transformation: moving, scaling, and rotating objects. In machine learning, they serve as the foundation for representing datasets and neural network weights. Engineering and scientific calculations, including the finite element method and optimization problems, rely on matrix operations. Physical modeling, statistical analysis, economic input-output models, and recommendation system algorithms also use matrices as the primary way of organizing information.

The most common problem is going out of the valid index range, which leads to memory corruption or program crash. Dimension mismatch during matrix multiplication or addition causes critical errors at runtime. Significant difficulties arise when processing sparse matrices, where zero values predominate, and naive algorithms waste resources inefficiently. Additional computational difficulties are created by poor matrix conditioning, when small errors in input data lead to colossal distortions of the result in linear algebra problems.

How Matrix works

The operating principle of a matrix is based on representing a two-dimensional structure in a one-dimensional memory address space. At the physical level, data is arranged either row-wise (Row-major order), as in the C language, or column-wise (Column-major order), as in Fortran and MATLAB. In row-wise arrangement, elements of a row are stored in adjacent cells, and to access the element with indices (i, j) a linear offset is calculated: i * number_of_columns + j. In column-wise arrangement, the formula changes to j * number_of_rows + i, and elements of a single column lie sequentially in memory. The choice of storage scheme critically affects performance due to processor cache memory architecture: sequential traversal of an array in accordance with its physical arrangement minimizes cache misses and can speed up algorithm execution by tens of times compared to random access or traversal across storage lines.

Unlike one-dimensional arrays, which provide constant-time access to a linear sequence, a matrix adds indexing arithmetic, slightly increasing overhead. Compared to dynamic structures such as linked lists or nested hash tables, a matrix provides maximum access speed and spatial locality of data, but lacks flexibility when resizing. Static matrices require allocation of a continuous memory block at creation and cannot expand; dynamic implementations via arrays of pointers to rows solve this problem at the cost of fragmentation and double indirection. Modern linear algebra libraries use vectorized SIMD instructions for parallel processing of matrix elements, and specialized storage formats, such as CSR for sparse matrices, eliminate storing zeros and replace them with three one-dimensional arrays describing values, column indices, and row start offsets.

Matrix functionality

  1. Creating a matrix via nested arrays. A matrix in the C language can be declared as a two-dimensional array of fixed size with the syntax float matrix[ROWS][COLS]. Memory is allocated on the stack continuously, with elements arranged in Row-Major order, which guarantees fast row-wise data scanning.
  2. Manual memory management via pointers. For dynamic memory allocation, an array of pointers to rows is used: float **M = malloc(R * sizeof(float*)) with a subsequent loop M[i] = malloc(C * sizeof(float)). This approach allows setting dimensions at runtime, but rows are physically scattered in the heap.
  3. Allocating a single memory block. An efficient technique of allocating a flat one-dimensional array float *data = malloc(R * C * sizeof(float)) with access via the macro #define IDX(i, j) ((i)*C + (j)). This guarantees data locality and allows passing the matrix to BLAS libraries as a single pointer without overhead.
  4. Initializing a zero matrix. The function void zero_matrix(float *M, int n) implements zeroing. For static arrays, calling memset(M, 0, sizeof(M)) is permissible, but for the dynamic flat model memset(data, 0, R*C*sizeof(float)) works correctly, since the representation of float zero coincides with bitwise zeroing according to IEEE 754.
  5. Filling an identity matrix. The function for setting diagonal elements to 1.0 and the rest to 0.0 must accept the matrix as a flat array. The loop body checks the condition if (i == j) M[IDX(i,j)] = 1.0f; else M[IDX(i,j)] = 0.0f;, which forms the neutral element of the matrix multiplication operation.
  6. Element-wise matrix copying. The copy_matrix operation performs deep copying of data from source to destination without changing the order of elements. The C implementation looks like memcpy(dst, src, R*C*sizeof(float)) if both arrays are allocated as a flat block, which minimizes time costs for transfer.
  7. In-place matrix transposition. Cyclic swapping of elements relative to the main diagonal requires the iteration for (i=0; i<R; i++) for (j=i+1; j<C; j++) swap(&M[IDX(i,j)], &M[IDX(j,i)]). The algorithm is correct only for square matrices, otherwise allocation of a new buffer is required.
  8. Matrix addition procedure. The add_matrix function calculates the element-wise sum C[i][j] = A[i][j] + B[i][j]. Dimensions must strictly match; in speed-critical applications, vectorization is used via SIMD intrinsics _mm256_add_ps, processing eight float elements at once per cycle.
  9. Matrix subtraction with bounds control. Similar to addition, the subtraction implementation performs M_out[IDX(i,j)] = M1[IDX(i,j)] - M2[IDX(i,j)] in a double nested loop. For safe work with external data, assertions assert(R1==R2 && C1==C2) are added, preventing memory corruption when dimensions diverge.
  10. Element-wise multiplication (Hadamard product). The hadamard_product operator multiplies corresponding elements of matrices of the same size. It is not standard matrix multiplication; it is used in convolutional neural networks and signal processing. It is implemented as C[i][j] = A[i][j] * B[i][j] within a linear pass.
  11. Classic matrix multiplication. The triple nested loop for (i) for (j) for (k) C[i][j] += A[i][k] * B[k][j] implements an algorithm with O(N³) complexity. To improve performance, the inner loop is unrolled, and sum accumulation is kept in a register variable to avoid frequent access to slow memory.
  12. Multiplication with transposition of the second matrix. The pattern C = A * B^T optimizes memory access by changing the loop order to i, j, k. When indexing B[k][j] instead of B[j][k], cache misses are reduced, as reading rows of both matrices becomes sequential, which is critically important for large sizes.
  13. Determinant calculation function. Recursive expansion along the first row or reduction to triangular form by the Gaussian method. For small fixed sizes (2×2, 3×3, 4×4), hardcoded formulas are applied, for example det2 = a[0]*a[3] - a[1]*a[2], which eliminates overhead on loops and recursion.
  14. Matrix inversion by the Gauss-Jordan method. An augmented matrix [A | I] is created, after which the forward pass zeros elements below the diagonal, and the backward pass zeros elements above the diagonal. In the end, the right part contains A^{-1}. Checking the pivot element for zero with selection of the maximum in the column for numerical stability is mandatory.
  15. Finding the maximum element. The find_max function scans the matrix linearly, maintaining the current maximum in a register variable. Starting value float max_val = M[0]. The loop compares fabsf(M[idx]) with max_val, returning the absolute maximum, which is necessary for data scaling.
  16. Scalar operation arithmetic. Multiplying all elements of a matrix by a constant scale_matrix(float *M, float alpha) applies the expression M[i] *= alpha. Adding a scalar increases image brightness: M[i] += beta. Modern compilers vectorize this loop automatically without modifying the source code.
  17. Scalar (Converting a multidimensional tensor into a single number)
  18. Checking matrix symmetry. The predicate is_symmetric traverses the upper triangle, checking M[IDX(i,j)] == M[IDX(j,i)] considering an epsilon tolerance fabsf(a - b) < 1e-6f for floating point. At the first violation, false is returned, making the check fast on asymmetric data.
  19. Normalizing matrix rows. The normalize_rows function calculates the Euclidean norm for each row: float nrm = sqrtf(sum_sq), after which all elements of the row are divided by this value. If the norm is close to zero, division is skipped to avoid the appearance of NaN in the resulting array.
  20. Extracting a submatrix. The procedure extracts a block of size sub_r x sub_c, starting from indices (row_off, col_off). The inner loop copies elements src[(i + row_off)*C + (j + col_off)] to dst[i*sub_c + j]. Bounds control prevents reading outside the source array through assertion checks.
  21. Formatted matrix output to console. The print_matrix function implements row-wise traversal with field width specification via printf("%8.4f ", val). For engineering visualization, a check for NAN and INF is added with replacement by special symbols. Output ends with a newline character after each matrix row.
  22. Export and import via binary dump. To save state, fwrite(data, sizeof(float), R*C, fp) is used. Reading is performed by strictly symmetrical fread. The file header often includes a signature and dimensions to automatically validate the structure before memory allocation when loading.

Comparisons

  • Matrix vs Vector. A vector is a one-dimensional array of numbers, whereas a matrix is a two-dimensional structure with rows and columns. A vector is ideal for describing simple sequences and points in space. A matrix extends this concept, allowing storage of tabular data and performance of linear algebra operations, such as multiplying rows by columns, impossible within a one-dimensional array without modifying its shape.
  • Vector (Ordered storage of numbers in continuous memory)
  • Matrix vs DataFrame. A DataFrame, unlike a purely numerical matrix, is a heterogeneous tabular structure. It allows storing columns of different data types (integers, strings, dates) and operates with explicit row indices and column names. A matrix, on the other hand, is mathematically strict: it is homogeneous and optimized exclusively for high-performance numerical calculations without the overhead of table metadata.
  • Matrix vs Array (multidimensional). An Array is a more general structure, capable of having three, four, or more dimensions, forming tensors. A matrix is a special case of an array with strictly two axes. When transitioning from a two-dimensional matrix to a multidimensional array, the specific syntax of matrix multiplication is lost. Interaction becomes element-wise, or requires specifying summation axes for generalized convolution operations.
  • Matrix vs List of Lists. A list of lists in general-purpose languages is a nested collection with dynamic typing and non-homogeneous length of nested elements, which does not guarantee rectangular shape. A matrix is a continuous block of memory with guaranteed equal row length. This provides a fundamental difference in performance: matrix operations use vector processor instructions and cache memory, while nested lists suffer from data fragmentation.
  • Matrix vs Sparse Matrix. A regular (dense) matrix stores every value in memory, including zeros. A Sparse Matrix is algorithmically structured differently: it stores only non-zero coefficients and their coordinates. The comparison reveals a compromise between universality and storage efficiency. The dense format is indispensable for compact data, while the sparse format is critical for graph problems with millions of nodes, where allocating memory for zeros is arithmetically impossible.

OS and driver support

Low-level representation of a matrix is implemented through a continuous block of virtual memory allocated by the system calls mmap or VirtualAlloc, allowing the operating system to transparently page data from disk when physical RAM is insufficient; graphics processor drivers interact with this block via the direct memory access mechanism, bypassing the central processor, for which the driver translates the virtual buffer address into physical PCI Express bus addresses and programs the DMA controller for asynchronous data transfer between host memory and accelerator global memory.

Data security and isolation

The boundaries of a two-dimensional array are checked at the compilation stage using template metaprogramming instantiating the class Matrix<Rows, Cols>, where dimensions are non-type parameters, and the access operator operator()(size_t i, size_t j) contains a runtime assertion assert(i < Rows && j < Cols), generating a SIGABRT signal when going out of bounds, whereas in performance-sensitive builds, address space isolation is applied by placing each array in a separate anonymous memory region with flags MAP_ANONYMOUS | MAP_PRIVATE, immediately followed by a guard page without access rights, causing an immediate kernel interrupt on an attempt of linear buffer overflow.

Logging and operation audit system

Each algebraic matrix transformation is wrapped in a functor that stores in an associative container the hash of arguments, a monotonic clock timestamp, the executing thread identifier, and a transaction trace identifier, then serializes this record into an asynchronous ring buffer of fixed size, from which a dedicated consumer thread asynchronously flushes data to a log file with rotation upon the SIGHUP signal or when a threshold size is reached, using the buffered I/O mechanism setvbuf with block synchronization to minimize system calls.

Limitations

The matrix dimension is limited by the maximum value of the size_t type, however the practical limit is imposed by the volume of available continuous virtual address space, which on systems with 48-bit addressing amounts to 256 terabytes, while the two-dimensional index [i][j] when using row-major layout is translated into the one-dimensional offset i * Cols + j, and integer overflow when calculating this offset for extremely large arrays is prevented by inserting checking code using the compiler built-in functions __builtin_mul_overflow, whereas row alignment to the processor cache line boundary inevitably introduces row tail fragmentation, wasting up to 63 bytes per row in strict alignment mode.

History and evolution

The representation of a two-dimensional numerical array evolved from static declarations double A[10][10] in the K&R C standard, which placed data on the call stack, through the introduction of dynamic allocation of an array of pointers to arrays in C89, which led to data non-locality and double dereferencing, to the modern approach of a single flat memory block with indexing operator overloading in C++11, using an auxiliary proxy class to emulate double indexing semantics; further evolution included the standardization of mdspan in C++23, providing a multidimensional view over an arbitrary continuous buffer with customizable access policies and traversal order, which separated logical dimensionality from physical layout without copying data.