Publications in Scientific Journals:
W. Steiner, M. Paulitsch, H. Kopetz:
"The TTA's Approach to Resilience after Transient Upsets";
The Time-Triggered Architecture, as architecture for safety-critical real-time applications, incorporates fault-tolerance mechanisms to ensure correct system operation despite failures. The primary fault hypothesis of the TTA claims to tolerate either the arbitrary failure of any one of its nodes or the passively arbitrary failure of any one of its communication channels. To cover these failure modes, active redundancy techniques are used, which basically means that nodes and channels are physically replicated. The primary fault hypothesis, is, however, not strong enough for certain applications that have to tolerate transient upsets of multiple, possibly all, components in the system. Such a transient upset of the system may break up the synchrony of the nodes and leave disjoined sets of nodes synchronized to each other while the overall synchronization is lost. Although the TTA provides a clique avoidance algorithm that is able to correct a wide class of such multiple transient failures, a stronger algorithm is needed for full coverage. In this paper we discuss a secondary fault hypothesis for the TTA that addresses the transient upset of multiple components and present a new clique resolving algorithm based on the TTA's integrated diagnosis and startup service.
Time-Triggered Architecture, clique resolving, multiple failures, recovery, self stabilization, startup
Created from the Publication Database of the Vienna University of Technology.