Close

Presentation

This content is available for: Workshop Reg Pass. Upgrade Registration
Preemptive Scheduling of Stateful GPU-Intensive HPC Applications in Kubernetes
DescriptionContainers provide a new paradigm for building, packaging, deploying and managing applications consistently across varying infrastructures. However, the utilization of containers in HPC has been more difficult due to the culmination of security and performance requirements. High resource utilization across GPU-intensive workloads is a crucial requirement for HPC clusters. Container orchestration platforms such as Kubernetes enable efficient management of HPC infrastructure for use by researchers who need access to scalable high performance facilities. However, the resource utilization of such orchestration frameworks with GPU-intensive HPC workloads remains relatively unexplored. In this paper we present kube-criu-scheduler, a Kubernetes scheduler that builds on a recently introduced container checkpointing feature to enable preemptive scheduling of GPU-accelerated HPC applications. Our results show that resulting efficiency and reliability gains are achieved with negligible impact on application performance.
Event Type
Workshop
TimeMonday, 13 November 20239:45am - 9:50am MST
Location607
Registration Categories
W