F. Endel, H. Piringer:
"Data Wrangling: Making data useful again";
Abstract: Data analysis has become an everyday business and dvancements of data management routines open up new opportunities. Nevertheless, transforming and assembling newly acquired data into a suitable form remains tedious. It is often stated, that data cleaning is a critical part of the overall process, but also consumes sublime amounts of time and resources.
DataWrangling is not only about transforming and cleaning procedures. Many other aspects like data quality, merging of different sources, reproducible processes, and managing data provenance have to be considered. Although various tools designed for specific tasks are available, software solutions accompanying the whole process are still rare.
In this paper, some aspects of this first phase of most data driven projects, also known as data wrangling, data munging or janitorial work are described. Beginning with an overview on the topic and current problems,concrete common tasks as well as selected software solutions and techniques are discussed.
Data acquisition, Databases, Bad data identification, Data wrangling
"Offizielle" elektronische Version der Publikation (entsprechend ihrem Digital Object Identifier - DOI)
Elektronische Version der Publikation:
Erstellt aus der Publikationsdatenbank der Technischen Universitšt Wien.