Close

Presentation

This content is available for: Workshop Reg Pass. Upgrade Registration
Self-Service Monitoring of HPC and Openstack Jobs for Users
DescriptionUsing correctly the compute capacity of an HPC or Openstack cluster is often a stumbling block for users, especially those from non-traditional domains where a cluster is only a tool and not the subject of their research.

This paper describes a web portal called TrailblazingTurtle built for HPC and Openstack Cluster to let users view the resources used and wasted by their jobs, without having to modify their workflow. The metrics are collected from various data sources on the cluster to enable monitoring at the job and VM level and are presented to the users and staff members as a simple web application. This platform makes it easy for newer users to request the correct quantity of computing resources for their work, see their impact on the shared file system, and the evolution of the priority of their group in Slurm.
Event Type
Workshop
TimeSunday, 12 November 20239:26am - 9:43am MST
Location503-504
Tags
Cloud Computing
Resource Management
State of the Practice
Registration Categories
W