Cache error propagation model

TitleCache error propagation model
Publication TypeJournal Article
Year of Publication1997
AuthorsAK Somani, and KS Trivedi
JournalProceedings of the Pacific Rim International Symposium on Fault Tolerant Systems, PRFTS
Start Page15
Pagination15 - 21
Date Published12/1997
Abstract

Cache memory is a small, fast, memory system that holds frequently used data. With increasing processor speed, aggressive design practices increase the probability of fault occurrence and the presence of latent errors as processor allows a short duration for read and write. The fault may corrupt the cache memory system or lead to an erroneous internal CPU state. In this paper, we investigate the error propagation in cache memory system due to transient faults either in the cache memory itself or in the processor's registers or both. The information gained from such an investigation should lead to the development of more effective error recovery mechanisms against failures due to transient faults arising in the machine's cache memory and register set. We establish that even though the computer system is capable of recovering about 50% of the time from the effect of a single erroneous cache location/processor register, the other 50% of the time error recovery is affected only through specific recovery mechanisms. Our results are obtained using both a discrete-time Markov model and by means of error injection on a real system.

Short TitleProceedings of the Pacific Rim International Symposium on Fault Tolerant Systems, PRFTS