System resiliency quantification using non-state-space and state-space analytic models

TitleSystem resiliency quantification using non-state-space and state-space analytic models
Publication TypeJournal Article
Year of Publication2013
AuthorsR Ghosh, D Kim, and KS Trivedi
JournalReliability Engineering & System Safety
Start Page109
Pagination109 - 125
Date Published06/2013

© 2013 Elsevier Ltd. All rights reserved. Resiliency is becoming an important service attribute for large scale distributed systems and networks. Key problems in resiliency quantification are lack of consensus on the definition of resiliency and systematic approach to quantify system resiliency. In general, resiliency is defined as the ability of (system/person/organization) to recover/defy/resist from any shock, insult, or disturbance [1]. Many researchers interpret resiliency as a synonym for fault-tolerance and reliability/availability. However, effect of failure/repair on systems is already covered by reliability/availability measures and that of on individual jobs is well covered under the umbrella of performability [2] and task completion time analysis [3]. We use Laprie [4] and Simoncini [5]'s definition in which resiliency is the persistence of service delivery that can justifiably be trusted, when facing changes. The changes we are referring to here are beyond the envelope of system configurations already considered during system design, that is, beyond fault tolerance. In this paper, we outline a general approach for system resiliency quantification. Using examples of non-state-space and state-space stochastic models, we analytically-numerically quantify the resiliency of system performance, reliability, availability and performability measures w.r.t. structural and parametric changes.

Short TitleReliability Engineering & System Safety