Effects of checkpointing and queueing on program performance

TitleEffects of checkpointing and queueing on program performance
Publication TypeJournal Article
Year of Publication1990
AuthorsVG Kulkarni, VF Nicola, and KS Trivedi
JournalCommunications in Statistics. Stochastic Models
Volume6
Issue4
Start Page615
Pagination615 - 648
Date Published01/1990
Abstract

Checkpointing is a technique for reducing the completion (execution) time of long-running batch programs in the presence of failures. It consists of intermittently saving the current status of the program under execution so that if a failure occurs, the program needs to be restarted from the most recent checkpoint rather than from the beginning. Because of occasional long reprocessing times and the possibility of failures during recovery, checkpoints during reprocessing is guaranteed to increase the effectiveness of any checkpointing technique. In all models considered thus far, checkpointing is allowed only during useful processing of the program. In this paper, we carry out the analysis of a model with checkpointing during reprocessing following failures. Closed-form expressions for the Laplace transform of the program completion time, its mean and variance are derived. It is shown that, asymptotically, the expected completion time increases linearly (exponentially) with the service requirement in the presence (absence) of checkpointing. The question of whether checkpointing is beneficial is addressed and the optimal checkpointing rate, α*, which minimizes the expected completion time, is computed. The completion time analysis is used as a building block in a queueing model with a Poisson input stream of jobs. This system can be viewed as an M/G/1 queue, in which the first customer starting a busy period is having a different distribution for its service requirement. We derive this distribution and then determine several measures of interest. The checkpointing rate, α**, which minimizes the mean response time is computed. It is numerically shown that α** is very close to α* for a wide range of system parameter values. Finally, sensitivity analysis is performed to study the effect of using non-optimal checkpointing rates on the mean completion time and the mean response time. © 1990, Taylor & Francis Group, LLC. All rights reserved.

DOI10.1080/15326349908807166
Short TitleCommunications in Statistics. Stochastic Models