Beiträge in Tagungsbänden:
A. Lipani, M. Lupu, A. Hanbury, A. Aizawa:
"Verboseness Fission for BM25 Document Length Normalization";
in: "ICTIR '15 Proceedings of the 2015 International Conference on The Theory of Information Retrieval",
ACM,
2015,
ISBN: 978-1-4503-3833-2,
S. 385
- 388.
Kurzfassung englisch:
BM25 is probably the most well known term weighting model in Information Retrieval. It has, depending on the formula variant at hand, 2 or 3 parameters (k1, b, and k3). This paper addresses b - the document length normalization parameter. Based on the observation that the two cases previously discussed for length normalization (multi-topicality and verboseness) are actually three: multi-topicality, verboseness with word repetition (repetitiveness) and verboseness with synonyms, we propose and test a new length normalization method that removes the need for a b parameter in BM25. Testing the new method on a set of purposefully varied test collections, we observe that we can obtain results statistically indistinguishable from the optimal results, therefore removing the need for ground-truth based optimization.
"Offizielle" elektronische Version der Publikation (entsprechend ihrem Digital Object Identifier - DOI)
http://dx.doi.org/10.1145/2808194.2809486
Elektronische Version der Publikation:
http://publik.tuwien.ac.at/files/PubDat_244472.pdf
Erstellt aus der Publikationsdatenbank der Technischen Universität Wien.