Scientific Reports:

J. Fabre, T. Losert, E. Marsden, N. Moffat, M. Paulitsch, D. Powell, W. Simmonds:
"Validation of Fault Tolerance and Timing Properties";

English abstract:
This deliverable gathers the work carried out in workpackage WP3 on the validation of dependable systems of systems. This work includes both validation carried out on models of systems and validation carried out by experiments on prototype systems. The deliverable consists of three chapters, devoted respectively to:
validation via model-checking.
Experimental validation via fault-injection.
Experimental validation via performance measurement.
The first chapter considers the use of data-independence methods to improve scalability of model-checking tools. We describe the development and use of a general scaling technique that has the potential to greatly increase the ability of model-checking tools to verify systems of systems. Our focus has been the use of data-independence techniques in the context of CSP. There has long been appeal to informal arguments for considering reduced sets of data when modelling systems in CSP, with the aim of bringing the system within the scope of automated checking (with, e.g., FDR). While the capabilities of the tool have expanded over the years, it will clearly never be possible to explore the state space of most systems both exhaustively and explicitly, so such techniques will remain an essential part of the armoury, together with other reductions such as compressions (approximating semantic minimisation). Work by Lazic and Roscoe aimed to put a formal underpinning in place beneath the more rigorously argued use of reduced data. Despite its clearly perceived potential for application, theirs was deliberately a primarily academic study, and while the language for which the data-independence theorems were proved bears a close family resemblance to CSP, it also has some significant differences. These differences caused us to extend the theory. The fundamental result of the Lazic/Roscoe theory is to exhibit a number of threshold sizes, varying according to criteria on the processes, such that a refinement holds for instantiations of some datatype with that size if and only if it holds for all (larger) sizes. This allows the automated checking of the specific threshold case to serve as a general proof, turning the inability to discover a counterexample into valid grounds for claiming the truth of an assertion. Our work has addressed the need to bridge the gap between that theoretical study and practical application in FDR, thus enabling automatic verification of systems of systems. The second chapter synthesizes the work that has been carried out on the experimental validation of the communications infrastructure of CORBA-based systems. We have identified the error propagation channels in a composite middleware-based architecture comprising multiple hardware and software layers. We present the results of experimental fault injection campaigns targeting a number of CORBA implementations and runtime supports, using two fault models: evaluating the impact of corrupt method invocations, and investigating error propagation from the operating system to the middleware. The work is presented in several phases:
Evaluation of the impact of corrupt IIOP invocations on the ORBacus and TAO implementations of the Name Service.
Impact of corrupt event on two implementations of the Event Service.
Comparative analysis of different ORB versions.
Evaluation of the impact of error propagation from the operating system to the middleware.
Synthesis of the lessons learnt from a dependable system integrator viewpoint.
A main conclusion of this experimental work is that middleware implementations are sensitive to certain forms of unexpected inputs from the environment and from surrounding software layers. This calls for the use of wrapping techniques in order to enhance the robustness of the middleware and the executive software.
Chapter 3 briefly describes a multi-cluster clock synchronization algorithm and its underlying system framework assumptions. This algorithm enables a system-wide global sparse time base, a fundamental for dependable systems of systems with timing constraints. The chapter then presents the validation of performance properties of the algorithm and shows that fault tolerance or fusion algorithms can build upon a globally synchronized time base in multi-cluster systems with sufficiently small accuracy. As the validated accuracy values lie in the order of one microsecond, the algorithm can even be used in distributed control application for cars or airplanes.

Electronic version of the publication:

Created from the Publication Database of the Vienna University of Technology.