Close

Presentation

This content is available for: Workshop Reg Pass. Upgrade Registration
Maximizing Data Utility for HPC Python Workflow Execution
DescriptionLarge-scale HPC workflows are increasingly implemented in dynamic languages such as Python, which allow for more rapid development than traditional techniques. However, the cost of executing Python applications at scale is often dominated by the distribution of common datasets and complex software dependencies. As the application scales up, data distribution becomes a limiting factor that prevents scaling beyond a few hundred nodes. To address this problem, we present the integration of Parsl (a Python-native parallel programming library) with TaskVine (a data-intensive workflow execution engine). Instead of relying on a shared filesystem to provide data to tasks on demand, Parsl is able to express advance data needs to TaskVine, which then performs efficient data distribution at runtime. This combination provides a performance speedup of 1.48x over the typical method of on-demand paging from the shared filesystem, while also providing an average task speedup of 1.79x with 2048 tasks and 256 nodes.
Event Type
Workshop
TimeSunday, 12 November 202311am - 11:15am MST
Location501-502
Tags
Applications
Distributed Computing
Large Scale Systems
Programming Frameworks and System Software
Runtime Systems
Registration Categories
W