Close

Presentation

LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear Programming
DescriptionThe shift towards high-bandwidth networks driven by AI workloads in data centers and HPC clusters has unintentionally aggravated network latency, adversely affecting the performance of communication-intensive HPC applications. Given the significant difference in network latency tolerance among MPI applications, accurately determining an application's latency resilience is crucial. Traditional methods for assessing this metric, relying on specialized hardware or simulators, tend to be inflexible and time-consuming. In response, we introduce LLAMP, a novel toolchain utilizing the LogGPS model and linear programming to analytically evaluate HPC applications' network latency tolerance. LLAMP equips software developers and network architects with essential insights for optimizing HPC infrastructures and strategically deploying applications to minimize latency impacts. We validate our toolchain across various applications, such as MILC, LULESH, and LAMMPS. Additionally, we include a case study with the ICON climate model, underscoring LLAMP's utility in improving the design and optimization of future HPC systems and applications.
Event Type
Paper
TimeWednesday, 20 November 20243:30pm - 4pm EST
LocationB309
Tags
Heterogeneous Computing
Linear Algebra
Network
Parallel Programming Methods, Models, Languages and Environments
Performance Evaluation and/or Optimization Tools
Registration Categories
TP
Award Finalists
Best Paper Finalist