[Zurück]


Beiträge in Tagungsbänden:

A. Kulmukhametov, A. Rauber, C. Becker:
"Improving data quality in large-scale repositories through conflict resolution";
in: "International Journal on Digital Libraries", Springer-Verlag, 2021, S. 365 - 383.



Kurzfassung englisch:
Digital repositories rely on technical metadata to manage their objects. The output of characterization tools is aggregated and analyzed through content profiling. The accuracy and correctness of characterization tools vary; they frequently produce contradicting outputs, resulting in metadata conflicts. The resulting metadata conflicts limit scalable preservation risk assessment and repository management. This article presents and evaluates a rule-based approach to improving data quality in this scenario through expert-conducted conflict resolution. We characterize the data quality challenges and present a method for developing conflict resolution rules to improve data quality. We evaluate the method and the resulting data quality improvements in an experiment on a publicly available document collection. The results demonstrate that our approach enables the effective resolution of conflicts by producing rules that reduce the number of conflicts in the data set from 17 to 3%. This replicable method for presents a significant improvement in content profiling technology for digital repositories, since the enhanced data quality can improve risk assessment and preservation management in digital repository systems.

Schlagworte:
Data quality Technical metadata Digital curation Conflict resolution Content profiling


"Offizielle" elektronische Version der Publikation (entsprechend ihrem Digital Object Identifier - DOI)
http://dx.doi.org/10.1007/s00799-021-00311-0

Elektronische Version der Publikation:
https://link.springer.com/article/10.1007%2Fs00799-021-00311-0


Erstellt aus der Publikationsdatenbank der Technischen Universität Wien.