M. Lupu:
"On the Usability of Random Indexing in Patent Retrieval";
Vortrag: International Conference on Conceptual Structures (ICCS), Iasi, Romania; 27.07.2014 - 30.07.2014; in: "Proceedings of the International Conference on Conceptual Structures (ICCS)", Springer LNCS, 8577 (2014), ISBN: 978-3-319-08389-6; S. 202 - 216.

Statistical semantics methods are fairly controversial in the IR community, mostly because of their instability and difficulty to debug. At the same time, they are extremely tempting, in the same way perhaps, as Artificial Intelligence was in the 60s. Then, it took a few decades for the hype to pass and for us to learn the real utility and limits of the great technologies developed earlier. This paper takes an exhaustive view of the performance and utility of a particular statistical semantics method, Random Indexing, in the context of difficult texts. After over a year of CPU time in experiments, we provide a global view of the behaviour of the method on a particularly challenging test collection based on patent data. In the end, we observe interesting patterns emerging in the semantic space created by the method, which we hypothesize to be the cause of the behaviour observed in the experiments.

patent, statistical semantics, information retrieval, text similarity

