MLS: Multilevel Scheduling in Large Scale High Performance Computers
TimeTuesday, June 23rd4:25pm - 4:28pm
DescriptionHPC systems exhibit increased parallelism at multiple levels and of various forms. Effectively exposing, expressing, and exploiting this multilevel and diverse parallelism is an open challenge.
The MLS project aims to offer an answer to the following research question which is directly related to the above challenge: Given massive parallelism, at multiple levels and of diverse forms and granularities, how can it be exposed, expressed, and exploited by parallel and distributed applications such that execution times are reduced, performance targets are achieved, and acceptable efficiency is maintained?
The MLS project employs a multilevel approach for achieving scalable scheduling in large scale HPC systems across the multiple levels of parallelism, with a focus on software parallelism. By connecting the scheduling decisions across the multiple levels of software parallelism, MLS differs from hierarchical scheduling, traditionally employed to achieve scalability within a single level of parallelism.
Specifically, MLS extends and bridges the most successful (job, process, and thread) scheduling models beyond a single or a couple of parallelism levels (scaling up) and beyond their current scale (scaling out).
Current results include the connection of job scheduling with process scheduling and the connection of process with thread scheduling. These prototypes have been tested in simulation and through real experiments on computational benchmarks and real applications.
The final outcome of the project will be a prototype multilevel scheduling solution that integrates live feedback information about the state of the application and system from the three, currently disjoint, scheduling levels: job, process, and thread.