DXG Kernel (Direct Graphics Access management)

DXG Kernel is a low-level component of the Windows kernel that enables a graphics card to directly access system memory without going through the central processor. It is critical for minimizing latency in video processing.

DXG is used in professional video capture systems, streaming broadcast, and VR devices where instant frame transfer is important. It is also involved in GPU-accelerated computing (CUDA/DirectCompute) and terminal servers supporting multiple remote sessions with hardware graphics.

Typical Issues

The main challenge is physical memory fragmentation, which leads to failures when allocating large contiguous buffers. Incorrect timeout settings cause driver resets (TDR). Conflicts may also occur with antivirus software that intercepts DMA calls.

How DXG Kernel works

Unlike standard CPU-based input/output, the DXG Kernel creates and verifies memory buffer objects using descriptors directly accessible to the graphics adapter. First, the driver requests from the kernel memory manager a region of physically contiguous pages. The kernel then locks these pages, preventing them from being paged out to disk. Compared to the MmMapLockedPages function (which only maps memory for the CPU), the DXG Kernel additionally generates an address translation table for the PCIe bus (IOMMU/SMMU). This allows the GPU to perform direct reads and writes via DMA without causing processor interrupts. Unlike the legacy DirectX VA, the DXG Kernel operates at the buffer level rather than pixel formats, ensuring memory isolation between different applications and preventing GPU access to other system pages.

DXG Kernel functions

  1. Protected mode initialization. After loading, the DXG Kernel transitions the system from real mode to protected mode using the global descriptor table. This allows addressing up to 4 GB of memory and sets segment limits, isolating critical kernel areas from user processes.
  2. Virtual memory manager. The kernel implements paged memory organization with a page size of 4 KB. The DXG Kernel supports not only physical addresses but also a paging mechanism where unused pages are paged out to a dedicated disk partition. This increases driver stability.
  3. Priority-based task scheduler. Preemptive multitasking based on 32 priority levels is used. The scheduler dynamically reassigns time slices based on thread behavior. The response time to hardware interrupts is less than 10 microseconds due to the interrupt vector table.
  4. Synchronization via mutexes. The DXG Kernel provides fast mutexes with a priority-based wait queue. The kernel implements priority inversion protection through an inheritance protocol. All synchronization objects have a built-in timeout to prevent deadlocks.
  5. Working with shared memory. For interprocess communication, the kernel exports physical memory regions through section objects. Access to these areas is controlled via security descriptors. Mapping is performed using the MapView function, which returns a linear address in the process context.
  6. Interrupt handler management. The system allows registering up to 256 interrupt vectors with request level flags. The DXG Kernel provides both maskable and non-maskable handlers. When an interrupt occurs, the kernel saves context on the stack and calls the chain of registered functions.
  7. Kernel pool and memory allocation. The DXG Kernel uses two pools: a nonpaged pool for code running at high IRQL and a paged pool. The allocation algorithm is based on free block lists with bitmaps to reduce fragmentation.
  8. Object reference monitor. Each kernel object (file, driver, event) contains a reference counter. The DXG Kernel automatically destroys the object when the counter reaches zero. Debug leak checking with stack tracing is enabled on each counter increment.
  9. System call security. Kernel entry points (syscall) validate all input parameters. The DXG Kernel verifies that buffers reside in user address space. An attempt to pass a kernel memory pointer generates an access violation exception.
  10. Asynchronous procedure mechanism. Asynchronous procedure calls (APCs) are used for deferred work. The DXG Kernel distinguishes between user-level and kernel-level procedures. APCs are queued to the target thread and execute on the next context switch with interrupts masked.
  11. I/O manager with queues. All I/O operations pass through hierarchical queues. The DXG Kernel supports both synchronous and asynchronous overlapped requests. Drivers register callback functions that are invoked when IRP packets complete.
  12. Exception stack tracing. When an error occurs, the kernel captures the current stack and saves it to a log buffer. The DXG Kernel can unwind the stack even without frame pointers by using compiler information about function prologues.
  13. On-the-fly module loading. Dynamic driver loading is implemented without system reboot. The DXG Kernel handles symbol imports and performs initial initialization through DriverEntry. Module unloading occurs only when reference counters reach zero and all handles are closed.
  14. Data access serialization. For critical sections, the kernel offers spin locks for both single-core and multi-core systems. The DXG Kernel automatically raises the current IRQL to DISPATCH_LEVEL, which prevents context switching while holding the lock.
  15. Data transfer buffering. When exchanging data between kernel and user mode, three methods are used: direct I/O (MDL), buffered I/O, and neither. The DXG Kernel selects the method based on device flags and copies data using ProbeForRead and ProbeForWrite.
  16. Power management via state. The kernel queries drivers for readiness to change power states (D0, D1, D2, D3). The DXG Kernel orders requests in the device stack so that critical components wake first and shut down last.
  17. High-resolution timers. Kernel timers have accuracy limited by the system clock (typically 1 ms). The DXG Kernel groups timers in a circular list and uses the KeSetTimer function for deferred DPC calls without blocking the calling thread.
  18. Signed module verification. Before loading any driver, the DXG Kernel verifies the digital signature using a built-in certificate. If the hash mismatches or the certificate has expired, loading is blocked, and the module is marked as untrusted in the kernel log.
  19. Remote debugging protocol. The kernel includes a built-in debugger activated via COM port or FireWire. The DXG Kernel supports breakpoints, memory and register reading, and the .crash command for artificial crash dumps. All debugging features are password-protected.
  20. Error logging with rotation. The system maintains a circular kernel message buffer of 512 KB. The DXG Kernel logs driver crashes, warnings, and informational events. When the buffer overflows, old entries are overwritten while preserving the last crash header.

Comparison with similar features

  • DXG Kernel vs CUDA Graphs. The DXG Kernel provides dynamic graph execution on the CPU with context switching at the kernel level, whereas CUDA Graphs statically bake sequences of GPU operations. This gives DXG flexibility in branching but falls behind CUDA in zero launch overhead for repetitive compute pipelines.
  • DXG Kernel vs Vulkan Timeline Semaphores. DXG combines synchronization and dispatching within a single kernel object, reducing the number of ring-0 transitions. Vulkan Timeline Semaphores focus only on signal waiting, leaving scheduling to the user. DXG wins in dense signal scenarios but loses in cross-queue synchronization flexibility.
  • DXG Kernel vs D3D12 ExecuteIndirect. DXG allows recursive kernel launches from kernel via pointer arguments, while ExecuteIndirect is limited to predefined command buckets. DXG is more efficient for dynamic workloads with changing call topology but requires a more complex memory model and TLB control.
  • DXG Kernel vs AMD AQL Queues. The DXG Kernel relies on a single queue descriptor with an internal priority queue, accelerating local dispatching. The AMD AQL Queue supports multi-user writes from GPU workers. DXG is better for centralized CPU schedulers; AQL is better for decentralized GPU worker queues directly.
  • DXG Kernel vs SYCL Graph. The DXG Kernel operates at the driver level without intermediate representation (IR), providing minimal latency but losing portability. SYCL Graph builds an abstract IR and enables cross-backend optimization (CUDA, Level Zero). DXG suits monolithic drivers; SYCL suits heterogeneous systems with multiple device types.

OS and Driver Support

DXG Kernel implements support for Windows 10/11 (NT 10.0 kernel) and Linux via a HAL compatibility layer, providing unified IRP and syscall handlers. Drivers are loaded through a proprietary PnP manager with compatibility checking based on INF versions and shadow caching of device objects (FDO/PDO).

Security

The kernel applies mandatory access control (MAC) based on integrity labels and isolates critical structures (SSDT, IDT) into hardware domains using VT-x/AMD-V. Code integrity is ensured by signing PE sections at load time and blocking the loading of unsigned drivers (SecLoad).

Logging

Logging is implemented via a ring buffer in nonpaged memory with asynchronous writing to a log file using I/O Completion Ports. A separate event tracer (ETW compatible) captures registry accesses, ObReferenceObjectByHandle calls, and call stacks with TSC timestamps.

Limitations

DXG Kernel imposes the following limitations: a maximum of 64 logical processors, no more than 1 TB of physical memory, no support for dynamic loading of minifilter drivers after system startup, and blocking of direct MSR access from user mode via syscall table patching.

History and Development

Development began in 2018 as a fork of ReactOS with the addition of a custom CFS-based scheduler. In 2021, a driver framework with its own IRQL model was added. In 2023, compatibility with WDM drivers up to version 10.0.19041 was implemented. The current version 0.9.2 focuses on minimizing timer ticks and supporting NVMe over Fabrics.