Close

Presentation

This content is available for: Tech Program Reg Pass, Exhibits Reg Pass. Upgrade Registration
Enabling Reproducibility and Scalability of Scientific Workflows in HPC and Cloud
DescriptionScientific communities across fields like earth science, biology, and materials science increasingly run complex workflows for their scientific discovery. We work closely with these communities to leverage high-performance computing (HPC), big data analytics, and artificial intelligence/machine learning (AI/ML) to increase and accelerate their workflows’ productivity. Our work addresses the new challenges brought about by this optimization process.

We identify three main challenges in these workflows: i) they integrate AI/ML methods with limited transparency and include many interoperable components (data and applications) that are hard to trace and reuse to reproduce results; ii) they hide the complexity of large intermediate data and their overall execution can be affected by the I/O bandwidth of the underlying infrastructure; and iii) they run on heterogeneous and distributed infrastructure with data and application dependencies that require efficient data management and resource allocation.

To address these challenges, we provide solutions that leverage the convergence between high-performance and cloud computing. First, we design and develop fine-grained containerized environments that enable data traceability and results explainability by automatically annotating and seamlessly attaching provenance information. Second, since the workflows are already containerized, we integrate them in HPC and native-cloud infrastructure and tune the storage technology to enable better I/O and data scalability. Finally, we orchestrate the end-to-end execution of workflows, ensuring efficient allocation of infrastructure resources and intermediate data management, and supporting reproducibility and reusability of workflows’ executions.
Event Type
ACM Student Research Competition: Graduate Poster
ACM Student Research Competition: Undergraduate Poster
Doctoral Showcase
Posters
Research Posters
Scientific Visualization & Data Analytics Showcase
TimeTuesday, 14 November 20235:15pm - 7pm MST
Tags
Reproducibility
Registration Categories
TP