Close

Presentation

This content is available for: Workshop Reg Pass. Upgrade Registration
Lightning Talk: Trade-Offs For Developing File Aggregated I/O For Asynchronous Checkpointing
DescriptionAsynchronous checkpoint-restart (C/R) has become popular in recent years for its ability to checkpoint alongside the application. One implementation is VELOC, which quickly checkpoints to a local storage device, and then flushes the checkpoints to the PFS concurrently with the application. VELOC natively adopts a file-per-process checkpointing strategy, meaning each distributed application process creates its own checkpoint file. File-per-process is easy to implement, enables embarrassing parallelism, and often provides high throughput by by-passing strict POSIX semantics. At sufficient scale, file-per-process is difficult for users to manage, making the checkpoints hard to verify, migrate and/or manage. Further, file-per-process strategies are not scalable, due to oversubscription of underlying storage hardware at significant scale, resulting in lower overall performance of the application and persisting the checkpoint.

To alleviate such challenges, asynchronous C/R must adopt file aggregation techniques. However, this is a nontrivial problem to solve as aggregation requires coordination (e.g. synchronization) between processes and compute nodes to facilitate aggregation, while also respecting the complex nature of resource competition between the application and asynchronous C/R implementation. The most common implementation of aggregation is known as two-phase I/O, where-in a subset of processes are designated as I/O leaders to flush data to the parallel file system (PFS) on behalf of all processes. State-of-the-art implementations of two-phase I/O, such as MPI-IO, overlap the data exchange phase and the flushing phase. However, previous works have shown that state-of-the-art aggregation methods, like MPI-IO, are insufficient for asynchronous checkpointing due to the inherent synchronization cost to perform I/O aggregation. Further, it has no mechanism for respecting resource consumption thereby negatively impacting the concurrently running application. This talk discusses our work towards developing a tunable I/O aggregation strategy that operates efficiently in the background to complement asynchronous C/R. We analyze trade-offs and discuss the performance impact on large-scale microbenchmarks for developing such strategies. Specifically, we explore how to (1) develop efficient, thread-safe data receptions on limited-sized write buffers of I/O leaders, (2) prioritize remote (from non-leaders) and local data on I/O leaders to minimize checkpoint overhead, and (3) load-balance flushing on the I/O leaders.
Event Type
Workshop
TimeSunday, 12 November 20235:10pm - 5:20pm MST
Location710
Tags
Fault Handling and Tolerance
Registration Categories
W