Close

Presentation

This content is available for: Tech Program Reg Pass, Exhibits Reg Pass. Upgrade Registration
File Aggregation for Asynchronous Multi-Level Checkpointing
DescriptionCheckpointing serves numerous functionalities in modern-day HPC systems and applications. In recent years, synchronous checkpointing, which blocks the application until checkpoints are persisted to external storage, suffers rising synchronization overheads at scale, resulting in little forward progress by the application. Therefore, asynchronous checkpointing has become more popular by quickly capturing checkpoints locally and flushing them in the background concurrently alongside the application. State-of-the-art solutions like VELOC utilize a file-per-process strategy, which is difficult for users and parallel file systems to manage. We implement a tunable N-to-M aggregation strategy within VELOC, obtaining 2.5x greater throughput than state-of-the-art aggregation library ADIOS2 and 1.5x higher throughput than the naive N-to-1 aggregation currently supported by VELOC.
Event Type
ACM Student Research Competition: Graduate Poster
ACM Student Research Competition: Undergraduate Poster
Doctoral Showcase
Posters
Research Posters
Scientific Visualization & Data Analytics Showcase
TimeTuesday, 14 November 20235:15pm - 7pm MST
Registration Categories
TP