[Back]


Scientific Reports:

U. Schmid:
"Failure Model Coverage under Transient Link Failures";
Report for Research Report 2/2004, Technische Universität Wien, Institut für Technische Informatik, Treitlstraße 3, A-1040 Vienna, Austria; 2004.



English abstract:
In a number of former papers, we analyzed several consensus and Byzantine agreement algorithms under a novel hybrid failure model for synchronous distributed systems. It extends traditional process failure models by allowing every process in the system to commit up to $ls$ send link failures and experience up to $lr$ receive link failures per round, without being considered (process-)faulty. The present paper shows that this model---and hence our algorithms---can also be applied in systems with high transient link failures rates: Assuming that every link may fail independently with some probability $p$ in every round, we derive the probability that the link failure bounds $ls$, $lr$ are respected during the entire execution of some communications-expensive Byzantine agreement algorithm. This probability can in fact be made (almost) as large as desired by increasing $lr$ and $ls$ appropriately, even though $n$ and hence the number of links that could fail systemwide increases as well. It turns out that our algorithms could even be applied in wireless systems, where link loss probabilities up to $p=10^{-2}$ are common.
Bibtex


Electronic version of the publication:
http://www.vmars.tuwien.ac.at/php/pserver/docdetail.php?DID=1320&viewmode=paper&year=2004