Related Work#
A range of systems seek to reduce CPU involvement in accelerator data access, each making different trade-offs between performance, programmability, and interoperability. Broadly, prior work can be categorized by whether it accelerates the data path while preserving CPU-initiated control, reassigns I/O initiation and scheduling to accelerators through exclusive ownership, or restructures file system services around accelerator execution models.
GPUDirect Storage (GDS) integrates peer-to-peer DMA into the operating-system I/O stack to eliminate redundant host memory copies across a range of storage backends, including locally attached NVMe devices, RDMA-accessible remote storage, and distributed and parallel file systems [2]. While this substantially improves data movement efficiency and broadens GPU access to diverse storage infrastructures, all I/O remains CPU-initiated and CPU-orchestrated, including request scheduling, metadata processing, and error handling. GDS therefore accelerates the data path without altering the initiator or control model of storage I/O.
Big Accelerator Memory (BaM) takes the opposite approach by enabling GPU-initiated access to NVMe through exclusive device ownership and a custom driver and metadata stack, allowing accelerators to both schedule and execute I/O directly [4, 18]. This design achieves near–line-rate performance by removing kernel and OS involvement from the fast path, but does so by abandoning file systems, POSIX semantics, and interoperability with OS-managed storage.
GeminiFS introduces a GPU-centric companion file system that integrates GPU execution with file system metadata operations through a custom kernel module and modified on-disk formats [19]. By coupling metadata management closely to GPU execution, GeminiFS reduces CPU involvement in both control and data paths, but does so by requiring changes to file system layout and kernel infrastructure, limiting compatibility with existing Linux file systems.
GoFS reorganizes file system metadata management and I/O services around accelerator execution models, exposing file system functionality through a specialized runtime rather than the conventional kernel storage stack [20]. While this design aligns storage services with accelerator-centric execution, it departs from OS-managed coordination and does not interoperate with unmodified on-disk file system formats or kernel semantics.
Accelerator-integrated Storage I/O (AiSIO) differs from these approaches, and from the systems evaluated in prior work [18], by preserving existing file system formats and Linux kernel semantics while enabling both CPU-initiated and device-initiated I/O paths to coexist under host orchestration. The Host-Orchestrated Multipath I/O (HOMI) reference implementation demonstrates how accelerators can initiate and drive storage I/O without assuming exclusive device ownership or undermining operating-system managed storage abstractions.