Close

Presentation

This content is available for: Tech Program Reg Pass, Exhibits Reg Pass. Upgrade Registration
Simultaneous Evaluation of Mindful Fault Checking across the CPU and GPU
DescriptionThis work comprehensively analyzes the overhead when implementing fault-checking algorithms for sparse preconditioned conjugate gradient (PCG) solvers on many-core and GPU-accelerated systems. Our objective is to selectively utilize GPUs for duplicate calculations based on the numerical properties of the sparse matrices to enhance the reliability and performance of linear system solutions. Enabling the ability to rely on the relatively underutilized CPU for fault detection improves scientific applications' ability to efficiently manage their resources on large-scale systems. By leveraging existing fault-checking techniques, we validate calculations and address potential numerical instabilities or precision-related issues during iterative solving. Through extensive experimentation on real hardware, we demonstrate the effectiveness of the conjugate gradient algorithm in providing accurate and reliable solutions for large linear systems.
Event Type
ACM Student Research Competition: Graduate Poster
ACM Student Research Competition: Undergraduate Poster
Posters
TimeWednesday, 15 November 202310am - 5pm MST
Tags
Artificial Intelligence/Machine Learning
Algorithms
Applications
Architecture and Networks
Cloud Computing
Distributed Computing
Data Analysis, Visualization, and Storage
Performance Measurement, Modeling, and Tools
Programming Frameworks and System Software
Registration Categories
TP
XO/EX