Availability Analysis of Systems Deploying Sequences of Environmental-Diversity-Based Recovery Methods

Year of Publication2021
AuthorsK Qiu, Z Zheng, KS Trivedi, and I Mura
JournalIeee Transactions on Reliability
Date Published09/2021

Mandelbug-caused software failures are significant threats to system availability, especially in the context of mission-critical and safety-critical systems. However, there is still no systematic method for keeping the software free from Mandelbugs before release. To guarantee the availability of systems suffering from Mandelbugs, environmental-diversity-based fault tolerance techniques have been proposed to recover from the failures caused by them. In this article, we develop and study an analytic model to assess the availability of systems that utilize a sequence of environmental-diversity-based recovery methods. Improving over previous relevant studies, the availability formula we obtain in this article works for any number of recovery methods the system is equipped with; it is also independent on both the nature of those recovery methods and the order of their utilization. In addition, we consider the problem of how to arrange the set of available recovery methods to achieve the largest system availability. Based on the results of our analysis, we develop an open-source tool, called OPENS, which assists in the calculation of the optimal system availability. We validate the effectiveness of the proposed modeling approach in two ways, namely by comparing our results with those obtained for specific systems considered in relevant studies and by conducting numerical analyses for more general scenarios of its application.

