Close

Presentation

This content is available for: Tech Program Reg Pass, Exhibits Reg Pass. Upgrade Registration
Investigating Anomalies in Compute Clusters: An Unsupervised Learning Approach
DescriptionAs compute clusters used for running batch jobs continue to grow in scale and complexity, the frequency of anomalies significantly increases. Timely detection of anomalous events has become vital to maintain system efficiency and availability. Our study presents an attention-based graph neural network (GNN) to detect anomalies in clusters at the compute node level and provide detailed root cause analysis to pinpoint issues. Evaluating on real-world datasets, attention-based GNN shows its ability to accurately detect and localize anomalies.
Event Type
Posters
Research Posters
TimeTuesday, 14 November 202310am - 5pm MST
Registration Categories
TP
XO/EX