Close

Presentation

This content is available for: Tech Program Reg Pass, Exhibits Reg Pass. Upgrade Registration
Near-Optimal Reduce on the Cerebras Wafer-Scale Engine
DescriptionEfficient reduce and allreduce communication collectives are crucial building blocks in many workloads, including deep learning training, and have been optimized for various architectures. We provide the first systematic investigation of the reduce operation on the Cerebras Wafer-Scale Engine (WSE) using the Cerebras SDK. We improve upon existing reduce implementations by up to 5x in certain settings. We show that using at most three different implementations we can achieve performance at most 1.38x slower than an optimal reduction tree. Finally, we provide an allreduce that outperforms patterns like ring or butterfly by up to 2x.

We will (a) cover unique features of the Cerebras WSE, (b) introduce a model to accurately predict performance on the hardware, (c) discuss different reduce implementations, (d) analyze the results of running them using an accurate simulator and compare them against an optimal reduction tree, (e) show how to extend them to an efficient allreduce.
Event Type
ACM Student Research Competition: Graduate Poster
ACM Student Research Competition: Undergraduate Poster
Posters
TimeWednesday, 15 November 202310am - 5pm MST
Tags
Artificial Intelligence/Machine Learning
Algorithms
Applications
Architecture and Networks
Cloud Computing
Distributed Computing
Data Analysis, Visualization, and Storage
Performance Measurement, Modeling, and Tools
Programming Frameworks and System Software
Registration Categories
TP
XO/EX