Presentation

· Contributors · Organizations · Search Program · My Schedule · Happening Now · Maps

This content is available for: Workshop Reg Pass. Upgrade Registration

Performance Portability of Programming Strategies for Nearest-Neighbor Communication with GPU-Aware MPI

Session2023 International Workshop on Performance, Portability, and Productivity in HPC (P3HPC)

DescriptionTo better advise HPC application developers, we have implemented Faces, a nearest-neighbor microbenchmark that quantifies performance trade-offs. The Faces experiments presented here explore the following design choices: 1) fewer dependent messages versus more independent messages, 2) fewer fused GPU kernels versus more simple kernels, 3) number of GPU streams, 4) size of GPU thread blocks, and 5) linear versus blocked ordering of MPI ranks. We present weak-scaling performance of a latency-sensitive "small'' per-rank domain and of a bandwidth-sensitive "large'' per-rank domain, and we compare results for two high-performance computers with contrasting CPU, GPU, and interconnect architectures: Summit and Frontier. We find that using more independent messages tends to give better performance than using few dependent messages. We identify performance-portability recommendations for GPU streams and synchronization, but other aspects of performance show complicated dependence on problem size and computer.

Author/Presenter

James B. White III

Hewlett Packard Enterprise (HPE)

Event Type

Workshop

TimeMonday, 13 November 202311:16am - 11:25am MST

Location605

ask a question

give feedback