Contributions to Proceedings:
A. Lipani, M. Lupu, A. Hanbury, A. Aizawa:
"Verboseness Fission for BM25 Document Length Normalization";
in: "ICTIR '15 Proceedings of the 2015 International Conference on The Theory of Information Retrieval",
BM25 is probably the most well known term weighting model in Information Retrieval. It has, depending on the formula variant at hand, 2 or 3 parameters (k1, b, and k3). This paper addresses b - the document length normalization parameter. Based on the observation that the two cases previously discussed for length normalization (multi-topicality and verboseness) are actually three: multi-topicality, verboseness with word repetition (repetitiveness) and verboseness with synonyms, we propose and test a new length normalization method that removes the need for a b parameter in BM25. Testing the new method on a set of purposefully varied test collections, we observe that we can obtain results statistically indistinguishable from the optimal results, therefore removing the need for ground-truth based optimization.
"Official" electronic version of the publication (accessed through its Digital Object Identifier - DOI)
Electronic version of the publication:
Created from the Publication Database of the Vienna University of Technology.