Doctor's Theses (authored and supervised):
"Analysis of Common Cause Faults in Dual Core Architectures";
Supervisor, Reviewer: A. Steininger, Z. Kotasek;
Institut für Technische Informatik,
oral examination: 2009-10-20.
Duplication and comparison has proven to be an e帷ient method for error detection us-
ing increased redundancy. Based on this generic principle dual core processor architectures
with output comparison are used for safety critical applications. Placing two instances of
the same (arbitrary) processor on one die yields a very cost e帷ient single chip imple-
mentation of this principle. At the same time, however, the physical coupling of the two
replica creates the potential for certain types of faults to a容ct both cores in the same
way, such that the mutual checking will fail. This class of malicious coupling results is
called common cause failures (CCFs), which constitutes a major factor when calculating
a redundant system's probability of failure. Due to the fact, that major safety standards
predict very high CCF rates for single chip architectures, the question is how reasonable
these rates are when using an extremely fast error detection mechanism as found on the
presented dual core architecture.
One key motivation is to 疸d out how this type of coverage leakage relates to other
imperfections of the duplication and comparison approach that would also be found using
two cores on separate dies (such as coupling over a common power supply or clock). To this
end, the thesis' aims are (i) 疸ding a suitable model which can describe the e容ct of CCFs
on a dual core architecture using fast error detection, (ii) 疸ding ways to quantify this
model through analysis or experiment data by splitting the problem into di容rent coupling
factors and (iii) 疸d possible countermeasures against said CCFs.
First, this thesis elaborates a model and analyzes several of the relevant physical cou-
pling mechanisms to decompose the genesis of a common cause fault into several steps.
This model is derived from the well known beta factor model found in literature and
amended by said coupling mechanisms and error detection mechanisms. The model serves
as a starting point for quantifying the risk of CCFs on a dual core processor.
As a next step, an experimental evaluation is done on the probability of error detection
after a fault has coupled into both cores of the dual core architecture. Gate level fault
injection is used for injecting points in both cores using a de疸ed time o峴et. These
experiments show, that very tight local and temporal coincidence of the fault e容ct in
both replica is a crucial prerequisite for a common cause fault. Based on this quantitative
input it can be seen from the decomposition model that the risk of common cause faults
is low for physical coupling mechanisms with relatively slow propagation speed, such as
thermal and mechanical e容cts.
Furthermore electrical faults showing immediate coupling are analyzed using power
supply disturbance fault injection. The experiments are 盍st carried out on an FPGA
architecture for an in-depth analysis of fault e容cts, and then on an ASIC architecture
for showing the general validity of the approach. The fault injection experiments show, that delay faults occurring at the critical path are the most dangerous class of faults.
Furthermore, several countermeasures such as time diversity, parity protection, voltage
monitor and increased timing margins are analyzed.
Finally those countermeasures are brought into context with the coupling mechanisms
determined for the CCF model. In a practical application these countermeasures serve as
a starting point for a solution strategy on defying CCFs on single chip solution.
Dual Core Architectures, Common Cause Faults
Electronic version of the publication:
Project Head Andreas Steininger:
Created from the Publication Database of the Vienna University of Technology.