Presentation

· Contributors · Organizations · Search Program · My Schedule · Happening Now · Maps

This content is available for: Workshop Reg Pass. Upgrade Registration

Fine-Grained Accelerator Partitioning for Machine Learning and Scientific Computing in Function as a Service Platform

Session13th International Workshop on Runtime and Operating Systems for Supercomputers (ROSS)

DescriptionFunction-as-a-service (FaaS) is a promising execution environment for high-performance computing (HPC) and machine learning (ML) applications, as it offers developers a simple way to write and deploy programs. Nowadays, GPUs and other accelerators are indispensable for HPC and ML workloads. However, we have observed that state-of-the-art FaaS frameworks usually treat accelerators as a single device to run a single workload and have little support for multiplexing accelerators.

In this work, we have presented techniques to multiplex GPUs with Parsl, a popular FaaS framework. With our enhancements, we show up to 60% lower task completion time and 250% improvement in the throughput of a large language model when multiplexing a GPU vs running without multiplexing. We plan to extend the support for GPU multiplexing in FaaS platforms by tackling the challenges of changing compute resources in the partition and approximating how to right-size a GPU partition for a function.

Author/Presenters

Argonne National Laboratory

Rolando P. Hong Enriquez

Hewlett Packard Labs

Gourav Rattihalli

Hewlett Packard Labs