Diploma and Master Theses (authored and supervised):
"Ausfallsicherheitsmechanismen in Datenstromverarbeitungssystemen";
Supervisor: S. Schulte, C. Hochreiner;
Institute of Information Systems Engineering, Distributed Systems Group,
final examination: 2018-02-26.
Stream processing is a practice where continuous data streams are processed and aggregated in near real-time, ultimately resulting in the discovery of new information. Stream processing applications (SPAs) are used to analyse data streams and are often deployed in a distributed manner for performance reasons. When faced with partial failures or network communication outages, fault tolerance mechanisms must ensure a continuous operation. Due to the near-real-time requirements, these mechanisms have to balance the need for consistency (i.e., producing correct results) and availability (i.e., producing results fast enough) in case of failures since fulﬁlling both at the same time is impossible. The key concept of fault tolerance is redundancy. Existing fault tolerance approaches for SPAs implement redundancy by replicating operators, the building blocks of an SPA. We argue that this approach is not sufﬁcient and present a novel fault tolerance model which focuses on functional redundancy on the level of paths (sequences of operators). Based on a concrete motivational scenario, we identify requirements of Pathﬁnder, our new fault tolerance framework, and evaluate it based on our motivational scenario. Pathﬁnder addresses the shortcomings of existing approaches by allowing SPA developers to specify functional redundancy. At runtime, Pathﬁnder reacts to faults by switching to a fault-free path with a similar functionality. To restore the main path once the failed operator has recovered, Pathﬁnder uses the circuit breaker pattern which has been proven in the domain of microservices. By comparing our approach to a fully redundant replication, we show that 30% of total operational costs can be saved while achieving a similar level of availability. Finally, several experiments show that Pathﬁnderīs failure detection and fault tolerance mechanisms are working as expected and only add a minimal performance overhead.
Created from the Publication Database of the Vienna University of Technology.