Reliability simulation of fault-tolerant software and systems

TitleReliability simulation of fault-tolerant software and systems
Publication TypeJournal Article
Year of Publication1997
AuthorsSS Gokhale, MR Lyu, and KS Trivedi
JournalProceedings of the Pacific Rim International Symposium on Fault Tolerant Systems, PRFTS
Start Page167
Pagination167 - 173
Date Published12/1997
Abstract

Fault tolerance is a survival attribute of complex computer systems and software in their ability to deliver continuous service to their users in the presence of faults. Formulating an analytic model for dependability and performance evaluation of hardware/software fault tolerant architectures can be quite cumbersome. Also, in practice, isolating the effect of various parameters on a system, while holding the others constant requires exploring a variety of scenarios. It is economically infeasible to build several such systems. Simulation offers an attractive mechanism for dependability evaluation and the study of the influence of various parameters on the failure behavior of the system. In this paper, we develop algorithms to simulate the failure behavior of three commonly used fault tolerant architectures, viz., Distributed Recovery Block (DRB), N-Version Programming (NVP) and N-Self Checking Programming (NSCP). We demonstrate the ability of the approach to simulate complex failure scenarios with various dependencies using some illustrative numerical examples.

Short TitleProceedings of the Pacific Rim International Symposium on Fault Tolerant Systems, PRFTS