Close

Presentation

This content is available for: Tech Program Reg Pass. Upgrade Registration
Automatic Generation of Distributed-Memory Mappings for Tensor Computations
DescriptionWhile considerable research has been directed at automatic parallelization for shared-memory platforms, little progress has been made in automatic parallelization schemes for distributed-memory systems. We introduce an innovative approach to automatically produce distributed-memory parallel code for an important sub-class of affine tensor computations common to Coupled Cluster (CC) electronic structure methods, neuro-imaging applications, and deep learning models.

We propose a novel systematic approach to modeling the relations and trade-offs of mapping computations and data onto multi-dimensional grids of homogeneous nodes. Our formulation explores the space of computation and data distributions across processor grids. Tensor programs are modeled as a non-linear symbolic formulation accounting for the volume of data communication and per-node capacity constraints induced under specific mappings. Solutions are found, iteratively, using the Z3 SMT solver, and used to automatically generate efficient MPI code. Our evaluation demonstrates the effectiveness of our approach over Distributed-Memory Pluto and the Cyclops Tensor Framework.
Event Type
Paper
TimeWednesday, 15 November 20233:30pm - 4pm MST
Location401-402
Tags
Artificial Intelligence/Machine Learning
Compilers
Performance Measurement, Modeling, and Tools
Performance Optimization
Programming Frameworks and System Software
Tensors
Registration Categories
TP
Reproducibility Badges