Close

Presentation

This content is available for: Tech Program Reg Pass, Exhibits Reg Pass. Upgrade Registration
From Stencils To Tensors: Running 3D Finite Difference Seismic Imaging on the Groq AI Inference Accelerator
DescriptionGroqChip™ is an AI accelerator optimized for running large-scale inference workloads with high throughput and ultra-low latency. It features a Tensor Streaming architecture optimized for matrix-oriented operations commonly found in AI, but the chip can also efficiently compute other applications such as HPC workloads that can be expressed as large-scale matrix multiplication. GroqChip uses a deterministic dataflow execution model that results in predictable and repeatable performance without runtime variation, and its RealScale™ chip-to-chip interconnect technology makes it possible to scale applications across cards in a node, or nodes in a rack, without hitting the bottlenecks of PCIe or the network.

Here, we explore how GroqChip and its architecture can be used to deliver high performance for linear algebra-based applications in HPC. Seismic imaging typically involves a 3D finite difference solver, which involves 3D stencil computations on a volume of data. The original stencil algorithm is not well-suited to run on a tensor-based architecture, but we outline how stencil operation can be transformed into tensor operations by decomposing the stencil and recomposing it into matrices. The finite difference step can now be solved by matrix multiplications and matrix transpositions. A single GroqChip can run the finite difference step for a sub-cube of data which is fully kept in on-chip memory, while larger volumes are computed by mapping the computation to a full rack or several racks. Halo data is exchanged between GroqChip processors via RealScale interconnect, enabling the scaling of the application’s domain size without PCIe or internode communication becoming the bottleneck. The deterministic dataflow model supports efficient orchestration of data movements within the chip and between chips without ever stalling the compute units. Finally, numerical analysis and optimization allows us to leverage of Groq TruePoint™ arithmetic to satisfy the numerical requirements of seismic imaging.
Event Type
Exhibitor Forum
TimeThursday, 16 November 202311:30am - 12pm MST
Location503-504
Tags
Accelerators
Artificial Intelligence/Machine Learning
Architecture and Networks
Hardware Technologies
Registration Categories
TP
XO/EX