Close

Presentation

This content is available for: Tech Program Reg Pass, Exhibits Reg Pass. Upgrade Registration
Cost-Effective LLM Inference Solution Using SK hynix's AiM (Accelerator-in-Memory)
DescriptionLarge language models (LLMs) are becoming increasingly popular for a variety of AI services, such as chatbots and virtual assistants. However, serving LLMs can be challenging, due to their high operating costs and long service latency. The main challenge in serving LLMs is the memory bandwidth bottleneck. LLMs require a lot of memory to store their parameters, and this memory bandwidth can be a limiting factor in the speed of inference. As LLM models continue to grow in size, this problem is only going to get worse.

We propose a new solution to the memory bandwidth bottleneck for serving LLMs. Our solution, called AiM (Accelerator-in-Memory), is a SK hynix's PIM device that is specialized for serving LLMs. AiM can exploit the abundant memory bandwidth available inside memory to accelerate GEMV operations, which are the most computationally expensive operations in LLM inference. We evaluated AiM on a variety of LLM models and tasks. Our results show that AiM can significantly improve the performance and energy efficiency of LLM inference. For example, on the GPT-3 model, AiM can achieve up to 10x speedup at lower cost and energy consumption over the state-of-the-art GPU systems.

We believe that AiM is a promising solution to the memory bandwidth bottleneck for serving LLMs. AiM can significantly improve the performance and energy efficiency of LLM inference, making it possible to deploy LLMs in real-world applications.
Event Type
Exhibitor Forum
TimeWednesday, 15 November 202310:30am - 11am MST
Location503-504
Tags
Accelerators
Artificial Intelligence/Machine Learning
Architecture and Networks
Hardware Technologies
Registration Categories
TP
XO/EX