[Back]


Diploma and Master Theses (authored and supervised):

B. Knasmüller:
"Ausfallsicherheitsmechanismen in Datenstromverarbeitungssystemen";
Supervisor: S. Schulte, C. Hochreiner; Institute of Information Systems Engineering, Distributed Systems Group, 2018; final examination: 2018-02-26.



English abstract:
Stream processing is a practice where continuous data streams are processed and aggregated in near real-time, ultimately resulting in the discovery of new information. Stream processing applications (SPAs) are used to analyse data streams and are often deployed in a distributed manner for performance reasons. When faced with partial failures or network communication outages, fault tolerance mechanisms must ensure a continuous operation. Due to the near-real-time requirements, these mechanisms have to balance the need for consistency (i.e., producing correct results) and availability (i.e., producing results fast enough) in case of failures since fulfilling both at the same time is impossible. The key concept of fault tolerance is redundancy. Existing fault tolerance approaches for SPAs implement redundancy by replicating operators, the building blocks of an SPA. We argue that this approach is not sufficient and present a novel fault tolerance model which focuses on functional redundancy on the level of paths (sequences of operators). Based on a concrete motivational scenario, we identify requirements of Pathfinder, our new fault tolerance framework, and evaluate it based on our motivational scenario. Pathfinder addresses the shortcomings of existing approaches by allowing SPA developers to specify functional redundancy. At runtime, Pathfinder reacts to faults by switching to a fault-free path with a similar functionality. To restore the main path once the failed operator has recovered, Pathfinder uses the circuit breaker pattern which has been proven in the domain of microservices. By comparing our approach to a fully redundant replication, we show that 30% of total operational costs can be saved while achieving a similar level of availability. Finally, several experiments show that Pathfinderīs failure detection and fault tolerance mechanisms are working as expected and only add a minimal performance overhead.

Created from the Publication Database of the Vienna University of Technology.