Presentation

· Contributors · Organizations · Search Program · My Schedule · Happening Now · Maps

This content is available for: Workshop Reg Pass. Upgrade Registration

Mixed-Precision S/DGEMM Using the TF32 and TF64 Frameworks on Low-Precision AI Tensor Cores

Session7th International Workshop on Software Correctness for HPC Applications (Correctness '23)

DescriptionUsing NVIDIA Tensor Cores has enabled the significant acceleration of general matrix multiplication for applications in AI and in high-performance computing. The use of such specialized accelerators can provide a performance increase between 8x and 20x, albeit with a loss in precision. However, higher precisions are required in many applications. Fortunately, mixed-precision methods can be employed to maintain a high precision while also taking advantage of the performance of lower-precision AI cores. We extend the state of the art by using NVIDIA’s new TF32 framework, which not only burdens some constraints of the previous frameworks but also provides an equivalent precision and performance by using a much simpler approach. We also propose a new framework called TF64 that attempts double-precision arithmetic with low-precision Tensor Cores. Although this framework does not exist yet, we validated the correctness of this idea and achieved an equivalent of 64-bit precision on 32-bit hardware.

Author/Presenters