A. Rauber, W. Merkl:
" Text Mining in the SOMLib Digital Library System: The Representation of Topics and Genres";
With the increasing amount of textual information available in electronic
form, more powerful methods for exploring, searching, and organizing the
available mass of information are needed to cope with this situation.
This paper presents the SOMLIB digital library system, built on neural
networks to provide text mining capabilities. At its foundation we use the
self-organizing map to provide content-based clustering of documents. By
using an extended model, i.e. the growing hierarchical self-organizing map,
we can further detect subject hierarchies in a document collection, with
the neural network adapting its size and structure automatically during its
unsupervised training process to reflect the topical hierarchy.
By mining the weight vector structure of the trained maps our system is
able to select keywords describing the various topical clusters.
Text mining has to incorporate more than the mere analysis of content.
Structural and genre information are key in organizing and locating
information. Using color-coding techniques we can integrate a structural
analysis of documents based on self-organizing maps into the subject-based
clustering relying on metaphor graphics for intuitive visualization.
We demonstrate the capabilities of the SOMLib system using collections of
articles from various newspapers and magazines.
Keywords: Document Clustering, Self-Organizing Map (SOM), Genre Analysis,
Metaphor Graphics, Digital Libraries.
Erstellt aus der Publikationsdatenbank der Technischen Universitšt Wien.