Close

Session

This content is available for: Workshop Reg Pass. Upgrade Registration
Event TypeWorkshop
TimeSunday, 12 November 20232pm - 5:30pm MST
Location710
Tags
Fault Handling and Tolerance
Registration Categories
W
Presentations
2:00pm - 2:05pm MSTWelcome to SuperCheck-SC23
2:05pm - 2:50pm MSTAI-Augmented SWARM Based Resilience for Integrate Research Infrastructures
2:50pm - 3:00pm MSTLightning Talk: Diaspora – Resilient Event Processing for Irregular, Distributed Scientific Applications
3:00pm - 3:25pm MSTSuperCheck-SC23 – Afternoon Break
3:25pm - 3:50pm MSTCheckpoint/Restart for CUDA Kernels
3:50pm - 4:15pm MSTImplementation-Oblivious Transparent Checkpoint-Restart for MPI
4:15pm - 4:40pm MSTAsynchronous Multi-Level Checkpointing: An Enabler of Reproducibility using Checkpoint History Analytics
4:40pm - 4:50pm MSTLightning Talk: Update on Checkpointing and Localized Recovery for Nested Fork-Join Programs
Presenter
4:50pm - 5:00pm MSTLightning Talk: Toward Efficient Asynchronous Checkpointing for Large-Language Models
5:00pm - 5:10pm MSTLightning Talk: Inherent Checkpointing Properties of Nested Parallelism
5:10pm - 5:20pm MSTLightning Talk: Trade-Offs For Developing File Aggregated I/O For Asynchronous Checkpointing
5:20pm - 5:30pm MSTLightning Talk: Datastates for Debugging – Using Productive Checkpointing for Improved Debugging