Close

Presentation

This content is available for: Workshop Reg Pass. Upgrade Registration
Performance Portability of Programming Strategies for Nearest-Neighbor Communication with GPU-Aware MPI
DescriptionTo better advise HPC application developers, we have implemented Faces, a nearest-neighbor microbenchmark that quantifies performance trade-offs. The Faces experiments presented here explore the following design choices: 1) fewer dependent messages versus more independent messages, 2) fewer fused GPU kernels versus more simple kernels, 3) number of GPU streams, 4) size of GPU thread blocks, and 5) linear versus blocked ordering of MPI ranks. We present weak-scaling performance of a latency-sensitive "small'' per-rank domain and of a bandwidth-sensitive "large'' per-rank domain, and we compare results for two high-performance computers with contrasting CPU, GPU, and interconnect architectures: Summit and Frontier. We find that using more independent messages tends to give better performance than using few dependent messages. We identify performance-portability recommendations for GPU streams and synchronization, but other aspects of performance show complicated dependence on problem size and computer.
Event Type
Workshop
TimeMonday, 13 November 202311:16am - 11:25am MST
Location605
Tags
Performance Measurement, Modeling, and Tools
Performance Optimization
Registration Categories
W