Close

Presentation

This content is available for: Tech Program Reg Pass, Exhibits Reg Pass. Upgrade Registration
Overcoming the Cost of Data Movement in AI Inference Accelerators
DescriptionThe largest performance bottleneck and energy usage in neural network acceleration is the fetching of weight and activation values prior to general matrix-vector (GEMV) or general matrix-matrix (GEMM) computation. Traditional von Neumann architectures, even with large on-chip caches, consume as much as 90% of their energy in data movement and only 10% for actual calculations, which limits their energy efficiency to, in most cases, low single digit TOPs/W. Analog in-memory compute, where the memory cell is used as part of the MAC calculation, suffers from accuracy issues and the required additional support circuitry, such as analog-to-digital and digital-to-analog converters, and compensation which obviates the inherent low-power advantages, limiting the state of the art to 3 TOPs/W.

The novel Untether AI at-memory compute architecture stores all weights directly on-chip in specially designed low-power SRAM using high-density bit cells that are tuned to directly feed the processing elements (PEs) using minimal energy. Because the PEs are directly adjacent to the SRAM cells, it only uses 2 femtojoules per bit-access. This innovation represents an order of magnitude improvement over compiled memory cells, and three orders of magnitude compared to fetching weights from external DRAM.
Event Type
Exhibitor Forum
TimeWednesday, 15 November 202311am - 11:30am MST
Location503-504
Tags
Accelerators
Artificial Intelligence/Machine Learning
Architecture and Networks
Hardware Technologies
Registration Categories
TP
XO/EX