Presentation

· Contributors · Organizations · Search Program · My Schedule · Happening Now · Maps

This content is available for: Workshop Reg Pass. Upgrade Registration

MPI-xCCL: A Portable MPI Library over Collective Communication Libraries for Various Accelerators

Session6th International Workshop on Emerging Parallel Distributed Runtime Systems and Middleware

DescriptionThe evolution of high-performance computing toward diverse accelerators, including NVIDIA, AMD, Intel GPUs, and Habana Gaudi Accelerators, demands a user-friendly and efficient utilization of these technologies. While both GPU-aware MPI libraries and vendor-specific communication libraries cater to communication requirements, trade-offs emerge based on library selection across various message sizes. Thus, prioritizing usability, we propose MPI-xCCL, a Message Passing Interface-based runtime with cross-accelerator support for efficient, portable, scalable, and optimized communication performance. MPI-xCCL incorporates vendor-specific libraries with GPU-aware MPI runtimes ensuring multi-accelerator compatibility while adhering to MPI standards. The proposed hybrid designs leverage the benefits of MPI and xCCL algorithms and transparently to the end user. We evaluated our designs on various HPC systems using OSU Micro-Benchmarks, and Deep Learning frameworks TensorFlow with Horovod. On NVIDIA-GPU-enabled ThetaGPU, our designs outperformed Open MPI by 4.6x. On emerging Habana Gaudi-based systems, MPI-xCCL was also able to deliver similar performance as vendor-provided communication runtimes.

Author/Presenters