Presentation

· Contributors · Organizations · Search Program · My Schedule · Happening Now · Maps

This content is available for: Tech Program Reg Pass, Exhibits Reg Pass. Upgrade Registration

Cost-Effective LLM Inference Solution Using SK hynix's AiM (Accelerator-in-Memory)

SessionExhibitor Forum: Super Intelligence I

DescriptionLarge language models (LLMs) are becoming increasingly popular for a variety of AI services, such as chatbots and virtual assistants. However, serving LLMs can be challenging, due to their high operating costs and long service latency. The main challenge in serving LLMs is the memory bandwidth bottleneck. LLMs require a lot of memory to store their parameters, and this memory bandwidth can be a limiting factor in the speed of inference. As LLM models continue to grow in size, this problem is only going to get worse.

We propose a new solution to the memory bandwidth bottleneck for serving LLMs. Our solution, called AiM (Accelerator-in-Memory), is a SK hynix's PIM device that is specialized for serving LLMs. AiM can exploit the abundant memory bandwidth available inside memory to accelerate GEMV operations, which are the most computationally expensive operations in LLM inference. We evaluated AiM on a variety of LLM models and tasks. Our results show that AiM can significantly improve the performance and energy efficiency of LLM inference. For example, on the GPT-3 model, AiM can achieve up to 10x speedup at lower cost and energy consumption over the state-of-the-art GPU systems.

We believe that AiM is a promising solution to the memory bandwidth bottleneck for serving LLMs. AiM can significantly improve the performance and energy efficiency of LLM inference, making it possible to deploy LLMs in real-world applications.

Presenter

Yongkee Kwon

SK hynix Inc

Event Type

Exhibitor Forum

TimeWednesday, 15 November 202310:30am - 11am MST

Location503-504

ask a question