Talks and Poster Presentations (with Proceedings-Entry):
N. Rekabsaz, M. Lupu, A. Hanbury, G. Zuccon:
"Exploration of a Threshold for Similarity based on Uncertainty in Word Embedding";
Talk: European Conference on IR Research (ECIR),
- 2017-04-13; in: "Advances in Information Retrieval",
Word embedding promises a quantification of the similarity between terms. However, it is not clear to what extent this similarity value can be of practical use for subsequent information access tasks. In particular, which range of similarity values is indicative of the actual term relatedness? We first observe and quantify the uncertainty of word embedding models with respect to the similarity values they generate. Based on this, we introduce a general threshold which effectively filters related terms. We explore the effect of dimensionality on this general threshold by conducting the experiments in different vector dimensions. Our evaluation on four test collections with four relevance scoring models supports the effectiveness of our approach, as the results of the proposed threshold are significantly better than the baseline while being equal to, or statistically indistinguishable from, the optimal results.
"Official" electronic version of the publication (accessed through its Digital Object Identifier - DOI)
Created from the Publication Database of the Vienna University of Technology.