[Back]


Talks and Poster Presentations (with Proceedings-Entry):

C. Becker, K. Duretec:
"Free Benchmark Corpora for Preservation Experiments: Using Model-Driven Engineering to Generate Data Sets";
Talk: Joint Conference on Digital Libraries 2013, Indianapolis, Indiana, USA; 2013-07-22 - 2013-07-26; in: "Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries", ACM, New York, NY, USA (2013), ISBN: 978-1-4503-2077-1; 349 - 358.



English abstract:
Digital preservation is an active area of research, and recent years have brought forward an increasing number of characterisation tools for the object-level analysis of digital content. However, there is a profound lack of objective, standardised and comparable metrics and benchmark collections to enable experimentation and validation of these tools. While fields such as Information Retrieval have for decades been able to rely on benchmark collections annotated with ground truth to enable systematic improvement of algorithms and systems along objective metrics, the digital preservation field is yet unable to provide the necessary ground truth for such benchmarks. Objective indicators, however, are the key enabler for quantitative experimentation and innovation.
This paper presents a systematic model-driven benchmark generation framework that aims to provide realistic approximations of real-world digital information collections with fully known ground truth that enables systematic quantitative experimentation, measurement and improvement against objective indicators. We describe the key motivation and idea behind the framework, outline the technological building blocks, and discuss results of the generation of page-based and hierarchical documents from a ground truth model. Based on a discussion of the benefits and challenges of the approach, we outline future work.

Keywords:
Repositories; Digital Preservation; Characterisation; Benchmark; Data Set; Ground Truth; Corpora; Model Driven Engineering


"Official" electronic version of the publication (accessed through its Digital Object Identifier - DOI)
http://dx.doi.org/10.1145/2467696.2467719

Electronic version of the publication:
http://publik.tuwien.ac.at/files/PubDat_223168.pdf


Created from the Publication Database of the Vienna University of Technology.