Close

Presentation

Your registration category doesn't include this presentation upgrade
Pinpointing Crash-Consistency Bugs in the HPC I/O Stack: A Cross-Layer Approach
DescriptionWe present ParaCrash, a testing framework for studying crash recovery in a typical HPC I/O stack, and demonstrate its use by identifying 15 new crash-consistency bugs in various parallel file systems (PFS) and I/O libraries. ParaCrash uses a "golden version'' approach to test the entire HPC I/O stack: storage state after recovery from a crash is correct if it matches the state that can be achieved by a partial execution with no crashes. It supports systematic testing of a multilayered I/O stack while properly identifying the layer responsible for the bugs.
Event Type
Paper
TimeThu, 18 Nov4pm - 4:30pm CST
Location240-241-242
Tags
Applications
Big Data
Datacenter
File Systems and I/O
Machine Learning and Artificial Intelligence
State of the Practice
Storage
Registration Categories
TP
Reproducibility Badges