Contributions to Proceedings:
S. Bashir, A. Rauber:
"Identification of Low/High Retrievable Patents using Content-Based Features";
in: "Proceeding of the 2nd ACM International workshop on Patent information retrieval (ACM-PAIRī09), Hong Kong, China, 6 November, 2009.",
issued by: Conference on Information and Knowledge Management (ACM-CIKM2009);
New York, NY, USA,
Document retrievability is a measurement used in information retrieval for identifying the bias of retrieval systems. In order to measure system bias for a specific document collection, an exhaustive set of queries is processed, measuring the frequency with which each document is retrieved. For better understanding and handling system bias, we need to understand the characteristics of documents that influence retrievability, and ideally be able to identify documents with high and low retrievability in advance. For this purpose, we identify a number of content-based features, which can be used effectively to classify a corpus into documents with low and high retrievability w.r.t a specific retrieval system. Our experiments on patent collections show that these features can achieve more than 80% classification accuracy on different systems, and hint at the need to combine different retrieval systems for optimizing recall.
"Official" electronic version of the publication (accessed through its Digital Object Identifier - DOI)
Electronic version of the publication:
Created from the Publication Database of the Vienna University of Technology.