Close

Presentation

This content is available for: Tech Program Reg Pass. Upgrade Registration
Optimizing Direct Convolutions on ARM Multi-Cores
DescriptionConvolution kernels are widely seen in deep learning workloads and are often responsible for performance bottlenecks. Recent research has demonstrated that a direct convolution approach can outperform the traditional convolution implementation based on tensor-to-matrix conversions. However, existing approaches for direct convolution still have room for performance improvement. We present NDIRECT, a new direct convolution approach that targets ARM-based multi-core CPUs commonly found in smartphones and HPC systems. NDIRECT is designed to be compatible with the data layout formats used by mainstream deep learning frameworks but offers new optimizations for the computational kernel, data packing, and parallelization. We evaluate NDIRECT by applying it to representative convolution kernels and demonstrating its performance on four distinct ARM multi-core CPU platforms. We compare NDIRECT against state-of-the-art convolution optimization techniques. Experimental results show that NDIRECT gives the best overall performance across evaluation scenarios and platforms.
Event Type
Paper
TimeThursday, 16 November 202310:30am - 11am MST
Location403-404
Tags
Artificial Intelligence/Machine Learning
Codesign
Performance Optimization
Programming Frameworks and System Software
Registration Categories
TP
Reproducibility Badges