Diploma and Master Theses (authored and supervised):
"Performance Analysis of Big Data Tools Based on Benchmarks for Store Sales Forecasting";
Supervisor: I. Brandic;
Institut für Informationssysteme, Distributed Systems Group,
final examination: 2014-10-07.
In the past few years the volume and variety of Big Data ignificantly increased. With the large growing amount of massive unstructured data the importance of its processing, storing, aggregation, analysis, and derivation of valuable information becomes stronger.
With the increasing amount of Big Data types and sources the number of enterprise and open source applications, techniques and resource usage models for big data analysis constantly rises. Most of them provides basic data mining techniques, however, there is still a significant difference in their possibility to scale, visualization capabilities, performance, extensibility, and processing of various data storages. Although, the large number of materials describing
strengths and weaknesses is available, they do not supply business members with the understanding whether a certain application will fit their real problems in a way that provides effective decision-making.
The main goal of this thesis is to investigate and compare such applications and tools for big data analysis as Weka, KNIME, Apache Mahout, and R using several time series real data sets and compare their applicability in the context. For these purposes, we develop and apply suitable scenarios that involve the usage of various data mining techniques and contain data analysis, data transformations, model accuracy assessment, and visualization. Based on achieved results big data analysis applications are evaluated and compared using multiple quantitative and qualitative measurements.
Created from the Publication Database of the Vienna University of Technology.