Close

Presentation

This content is available for: Tech Program Reg Pass. Upgrade Registration
Optimizing MPI Collectives on Shared Memory Multi-Cores
DescriptionCollective communication operations, such as broadcasting and reductions, often contribute to performance bottlenecks in Message Passing Interface (MPI) programs. As the number of processor cores integrated into CPUs increases, running multiple MPI processes on shared-memory machines to leverage hardware parallelism is becoming increasingly common. In this context, optimizing MPI collective communications for shared-memory execution is crucial. This paper identifies two primary limitations of existing MPI collective implementations on shared-memory systems. The first is the extensive redundant data movements when performing reduction collectives, and the second is the ineffective use of non-temporal instructions to optimize streamed data processing. To address these challenges, we propose two optimization techniques designed to minimize data movements and enhance the use of non-temporal instructions. We integrate our optimizations into the OpenMPI and evaluate their performance through micro-benchmarks and real-world application tests on two multi-core clusters. Experiments show that our approach significantly outperforms existing techniques by 1.2-6.4x.
Event Type
Paper
TimeTuesday, 14 November 20233:30pm - 4pm MST
Location403-404
Tags
Distributed Computing
Message Passing
Programming Frameworks and System Software
Registration Categories
TP
Award Finalists
Best Student Paper Finalist
Reproducibility Badges