Presentation

· Contributors · Organizations · Search Program · My Schedule · Happening Now · Maps

This content is available for: Workshop Reg Pass. Upgrade Registration

Automatic Generation of Micro-Kernels for Performance Portability of Matrix Multiplication on RISC-V Vector Processors

SessionSecond International Workshop on RISC-V for HPC

DescriptionIn this paper, we propose and evaluate several optimized implementations of the general matrix multiplication (Gemm) on two different RISC-V architecture cores implementing the RISC-V vector extension (RVV): C906 and C910 from T-HEAD. Specifically, we address the performance portability problem across these processor cores by means of an automatic assembly code generator, written in Python, capable of emitting RVV code for high performance computing (HPC), with a variety of combinations of specific and general optimizations.

Our experimental results using a number of automatically-generated micro-kernels for Gemm, on both RISC-V architectures, reveal different impact of each optimization, depending on the target architecture, and highlight the importance of automatically generating HPC RVV code to achieve performance portability while reducing the developers' effort. In addition, these optimizations show important performance gains with respect to to a state-of-the-art tuned BLAS library (OpenBLAS), reaching 3x and 1.3x speed-ups for the C910 and C906, respectively.

Author/Presenters

Francisco Igual

Complutense University of Madrid

Luis Piñuel

Complutense University of Madrid

Sandra Catalán

Jaume I University, Spain

Héctor Martínez

Universidad de Córdoba

Adrián Castelló

Universidad Politecnica de Valencia

Enrique Quintana-Ortí