Close

Presentation

This content is available for: Workshop Reg Pass. Upgrade Registration
When to Checkpoint at the End of a Fixed-Length Reservation?
DescriptionConsider an application executing for a fixed duration. The checkpoint duration is a stochastic random variable that obeys some well-known probability distribution law. The question is when to take a checkpoint towards the end of the execution, so that the expectation of the work done is maximized. In the first scenario, a checkpoint can be taken at any time.

We provide the optimal solution for a variety of probability distribution laws modeling checkpoint duration. In the second scenario, the application is a chain of tasks with IID stochastic execution times, and a checkpoint can be taken only at the end of a task. First, we introduce a static strategy where we compute the optimal number of tasks before the checkpoint at the beginning of the execution. Then, we design a dynamic strategy that decides whether to checkpoint or to continue execution at the end of each task.
Event Type
Workshop
TimeSunday, 12 November 20235:05pm - 5:29pm MST
Location605
Tags
Fault Handling and Tolerance
Large Scale Systems
Registration Categories
W