[Zurück]


Beiträge in Tagungsbänden:

A. Lipani, M. Lupu, A. Hanbury, A. Aizawa:
"Verboseness Fission for BM25 Document Length Normalization";
in: "ICTIR '15 Proceedings of the 2015 International Conference on The Theory of Information Retrieval", ACM, 2015, ISBN: 978-1-4503-3833-2, S. 385 - 388.



Kurzfassung englisch:
BM25 is probably the most well known term weighting model in Information Retrieval. It has, depending on the formula variant at hand, 2 or 3 parameters (k1, b, and k3). This paper addresses b - the document length normalization parameter. Based on the observation that the two cases previously discussed for length normalization (multi-topicality and verboseness) are actually three: multi-topicality, verboseness with word repetition (repetitiveness) and verboseness with synonyms, we propose and test a new length normalization method that removes the need for a b parameter in BM25. Testing the new method on a set of purposefully varied test collections, we observe that we can obtain results statistically indistinguishable from the optimal results, therefore removing the need for ground-truth based optimization.


"Offizielle" elektronische Version der Publikation (entsprechend ihrem Digital Object Identifier - DOI)
http://dx.doi.org/10.1145/2808194.2809486

Elektronische Version der Publikation:
http://publik.tuwien.ac.at/files/PubDat_244472.pdf


Erstellt aus der Publikationsdatenbank der Technischen Universität Wien.