Review of the green book from Veena Mendiratta

Having been very familiar with Professor Trivedi’s book on Probability and Statistics with Reliability, Queuing and Computer Science Applications [1] besides his books on SHARPE [2] and on Queueing Networks and Markov Chains [3 ], I was eager to read his new book, Reliability and Availability Engineering: Modeling, Analysis and Applications, co-authored with Andrea Bobbio. 

Since one of the authors has authored several previous books on reliability, it is natural to ask what is different about this latest book. The probability book [1] is basic and covers reliability as well as queuing. It aims to teach probability, stochastic processes and statistics, illustrating these concepts via reliability, queueing, and computer science examples. For a deeper treatment of queueing systems and networks, [3 ] is a good reference. Similarly, a deeper treatment of reliability and availability is presented in this new book. This new book not only contains detailed algorithms and examples of most model types employed in reliability and availability modeling but also contains real system examples. For example, fault trees are mentioned in [1] and a few examples are given, while in this new book, there is a full chapter on fault trees. There is a single, simple example of a semi Markov process in [1] while a whole chapter is devoted to this topic in the new book. Similarly, there a few examples of multi-level modeling in [1], whereas three chapters are devoted to multi-level modeling in the new book. All examples used to illustrate probability concepts in the earlier book are, of necessity, toy examples while many real large examples are discussed in the new book. 

In this book the authors have produced an impressive volume consisting of all existing methods of reliability and availability modeling with complete mathematical details and relevant algorithms. As one would expect from these authors, all mathematical concepts are illustrated through numerous examples. Some of the examples are continued through multiple chapters in order to show the application of different techniques on the same basic example. Many (unsolved) problems are provided for the reader to further practice on their own.  

The book not only covers classical techniques like non-state space methods (e.g. reliability block diagrams, network reliability and fault-trees) and state space methods (e.g. continuous time Markov chains), but also newer topics and analysis techniques, like binary decision diagrams, dynamic fault trees, Bayesian belief networks and stochastic Petri nets. More advanced techniques to relax the ubiquitous assumption of exponential distributions is addressed in three chapters: one on non-homogeneous Markov chains, the second on semi-Markov and Markov regenerative processes, and the third on the use of Phase type expansions. The book illustrates how to combine different models together into a multi-level model to take advantage of the capabilities of different techniques in different parts of the overall model. This avoids the complexity of generating, storing and solving a large monolithic model. Two full chapters are devoted to developing this idea and later used heavily in the case studies chapter. 

Many real-life problems are developed and solved in detail. For example, the current return network subsystem of the Boeing 787 is cast as a network reliability problem and a bounding algorithm is developed to solve this otherwise intractable problem. This algorithm is incorporated in the SHARPE [2] software package and is being used by Boeing for FAA certification for all planes with this subsystem. The IBM Blade Center availability model is developed in detail. IBM’s implementation of SIP on WebSphere is analyzed for its availability that includes both hardware and software failures and various stages of recovery as well as imperfect coverage. Besides system availability, a customer affecting metric known as DPM (defects per million) is studied which accounts for various call processing phases and their interaction with recovery, after a component failure, besides retry attempts.  Sun Microsystems high availability platform and Cisco router availability models are also developed in detail.  

Although software reliability and availability do not have dedicated chapters, many examples contain software failure and recovery. Software aging and rejuvenation are also covered through examples. Other unique topics covered are combined performance and availability (performability), survivability, parametric uncertainty propagation and some examples dealing with cybersecurity. 

In preparing the book the authors clearly had to make some choices as no single book can cover all the aspects of a general discipline like reliability. They decided to concentrate their efforts on analytic models that can lead to quantifiable solutions via closed form or numerical techniques.  Throughout the book, they have shown how the proposed models are solvable, either in closed-form or numerically by means of a software package like SHARPE, whose practice is encouraged in the solution of the problems as well. 

The book is well organized in six parts covering: Part I Introduction, Part II Non-State-Space (Combinatorial) Models, Part III State-Space Models with Exponential Distributions, Part IV State-Space Models with Non-Exponential Distributions, Part V Multi-Level Models, Part VI Case Studies. This book can be used as a textbook for a course on reliability, and as a reference book for researchers and practicing engineers. A solution manual and slides of all chapters are expected to be available soon. 

References 

  1. Kishor S. Trivedi. Probability and Statistics with Reliability, Queuing and Computer Science Applications (2nd edition). John Wiley and Sons, 2001; Revised Paperback, 2017; Chinese translation, 2016. 

  1. Robin A. Sahner, Kishor S. Trivedi, and Antonio Puliafito. 1996. Performance and Reliability Analysis of Computer Systems: An Example-Based Approach Using the SHARPE Software Package. Kluwer Academic Publishers, Norwell, MA, USA. 

  1. Gunter Bolch, Stefan Greiner, Hermann de Meer, and Kishor Trivedi (2nd edition). 2006. Queueing Networks and Markov Chains. John Wiley and Sons.