Presentation

· Contributors · Organizations · Search Program · My Schedule · Happening Now · Maps

This content is available for: Tech Program Reg Pass. Upgrade Registration

Automatic Generation of Distributed-Memory Mappings for Tensor Computations

DescriptionWhile considerable research has been directed at automatic parallelization for shared-memory platforms, little progress has been made in automatic parallelization schemes for distributed-memory systems. We introduce an innovative approach to automatically produce distributed-memory parallel code for an important sub-class of affine tensor computations common to Coupled Cluster (CC) electronic structure methods, neuro-imaging applications, and deep learning models.

We propose a novel systematic approach to modeling the relations and trade-offs of mapping computations and data onto multi-dimensional grids of homogeneous nodes. Our formulation explores the space of computation and data distributions across processor grids. Tensor programs are modeled as a non-linear symbolic formulation accounting for the volume of data communication and per-node capacity constraints induced under specific mappings. Solutions are found, iteratively, using the Z3 SMT solver, and used to automatically generate efficient MPI code. Our evaluation demonstrates the effectiveness of our approach over Distributed-Memory Pluto and the Cyclops Tensor Framework.

Authors

Martin Kong

Ohio State University

Raneem Abu Yosef

Ohio State University

Atanas Rountev

Ohio State University

P. Sadayappan

University of Utah

Event Type

Paper