Contributions to Proceedings:
A. Kulmukhametov, A. Rauber, C. Becker:
"Improving data quality in large-scale repositories through conflict resolution";
in: "International Journal on Digital Libraries",
Digital repositories rely on technical metadata to manage their objects. The output of characterization tools is aggregated and analyzed through content profiling. The accuracy and correctness of characterization tools vary; they frequently produce contradicting outputs, resulting in metadata conflicts. The resulting metadata conflicts limit scalable preservation risk assessment and repository management. This article presents and evaluates a rule-based approach to improving data quality in this scenario through expert-conducted conflict resolution. We characterize the data quality challenges and present a method for developing conflict resolution rules to improve data quality. We evaluate the method and the resulting data quality improvements in an experiment on a publicly available document collection. The results demonstrate that our approach enables the effective resolution of conflicts by producing rules that reduce the number of conflicts in the data set from 17 to 3%. This replicable method for presents a significant improvement in content profiling technology for digital repositories, since the enhanced data quality can improve risk assessment and preservation management in digital repository systems.
Data quality Technical metadata Digital curation Conflict resolution Content profiling
"Official" electronic version of the publication (accessed through its Digital Object Identifier - DOI)
Electronic version of the publication:
Created from the Publication Database of the Vienna University of Technology.