Presentation

· Contributors · Organizations · Search Program · My Schedule · Happening Now · Maps

This content is available for: Tech Program Reg Pass, Exhibits Reg Pass. Upgrade Registration

Near-Optimal Reduce on the Cerebras Wafer-Scale Engine

SessionACM Student Research Competition Posters Display

DescriptionEfficient reduce and allreduce communication collectives are crucial building blocks in many workloads, including deep learning training, and have been optimized for various architectures. We provide the first systematic investigation of the reduce operation on the Cerebras Wafer-Scale Engine (WSE) using the Cerebras SDK. We improve upon existing reduce implementations by up to 5x in certain settings. We show that using at most three different implementations we can achieve performance at most 1.38x slower than an optimal reduction tree. Finally, we provide an allreduce that outperforms patterns like ring or butterfly by up to 2x.

We will (a) cover unique features of the Cerebras WSE, (b) introduce a model to accurately predict performance on the hardware, (c) discuss different reduce implementations, (d) analyze the results of running them using an accurate simulator and compare them against an optimal reduction tree, (e) show how to extend them to an efficient allreduce.

Author

Piotr Luczynski

ETH Zürich

Event Type

ACM Student Research Competition: Graduate Poster

ACM Student Research Competition: Undergraduate Poster

Posters

TimeWednesday, 15 November 202310am - 5pm MST

LocationDEF Concourse

ask a question