Close

Presentation

This content is available for: Workshop Reg Pass. Upgrade Registration
Keynote: Empowering Large AI Models Based on Heterogeneous Memory
DescriptionThe size of large artificial intelligence (AI) models has increased by at least 100x in the past few years, which leads to memory consumption at the scale of hundreds of GBs and even TBs. Recent advance of heterogeneous memory (HM) provides a cost-effective approach to increase memory capacity. Using external memory (e.g., CXL memory expansion and GPU-like accelerator's memory) as an extension to GPU memory, we can build an HM to enable large-scale AI model inference and training without using extra GPUs to accommodate large memory consumption. However, not only HM imposes challenges on tensor allocation and migration on HM itself, but it is also unclear how HM affects training/inference throughput. AI model workload possesses unique characteristics of memory access patterns and data structures, which places challenges on the promptness of data migration, load balancing, and tensor redundancy on GPU. In this talk, I will discuss the work we have been done to optimize the management of HM for large language model and graph neural networks. The key insights in our designs are to leverage AI domain knowledge to reconcile the tensions between multiple design targets (e.g., minimizing tensor migration volume and maintaining high system throughput). Finally, I will discuss the opportunities and challenges for future HM management in the era of large generative models.
Event Type
Workshop
TimeFriday, 17 November 20238:40am - 9:30am MST
Location405-406-407
Tags
Data Movement and Memory
Heterogeneous Computing
Registration Categories
W