Diploma and Master Theses (authored and supervised):
"Enhanced Performance Testing and Monitoring of JVM-based Distributed Data-Processing Applications";
Supervisor: S. Dustdar, M. Vögler;
Institut für Informationssysteme, Distributed Systems Group,
final examination: 2016-01-11.
In the age of big data with ever-growing data volumes, data-processing applications face considerable performance challenges. If they do not fulfill their performance requirements, they do not deliver their intended benefit to their organization. Therefore, performance testing and monitoring is crucial for organizations as it enables them to test, analyze and assess the performance of their data-processing applications. Since single machines have not kept up with the growing data volumes, data-processing applications have to scale across clusters, grids or other distributed infrastructures. Whereas distribution allows such applications to meet their performance requirements, it comes at a cost. Besides the design and manageability challenges that emerge, performance testing and monitoring become more difficult to conduct. This especially applies to data-processing applications, where monitoring has not been considered at design time. There are existing testing and monitoring solutions for distributed systems. Unfortunately these tools are often limited in their scope: Either they are focused on certain metrics, such as a serverīs resource metrics, or bound to a particular environment or data-processing engine.
The goal of this work is to investigate how the performance of distributed JVM-based dataprocessing applications can be tested and monitored independently from a particular environment or data-processing engine. The different challenges when monitoring a JVM-based distributed data-processing application are analyzed step by step, from defining proper metrics, dealing with data acquisition and publication, to measurement data analysis. Based on the result
of the analysis, a design for a framework that allows to monitor and test any JVM-based distributed data-processing application is proposed. To demonstrate the feasibility of our design, a proof-of-concept implementation of the framework is developed. Finally, in order to evaluate the framework and to show that it serves its purpose, it is applied to a demonstration scenario implemented based on both, Apache Spark Streaming and Apache Storm, where the resulting
measurement data is analyzed and the results are discussed.
Created from the Publication Database of the Vienna University of Technology.