Analysis of Scheduling Policies for Next-Generation Rabbit Architecture
DescriptionThe Livermore El Capitan supercomputer is planned to have a brand-new architecture, with Rabbit nodes containing SSDs placed at the top of the racks. This will allow SSDs to be accessed either directly through PCIe connection or through the network fabric, which creates opportunities for new scheduling policies. It is the objective of this work to evaluate scheduling policies in the context of this new architecture. This will be accomplished via determining which metrics are relevant, evaluating existing policies under those metrics to determine potential shortcomings, exploring new policies which can improve those shortcomings, and ultimately evaluating those new policies on the real hardware once it becomes available (this work is ongoing until we can evaluate on real hardware). This work has evaluated common existing policies in the Flux scheduler, and has examined a simple new policy which partially solves some of the identified problems with existing policies.
ACM Student Research Competition: Graduate Poster