[Zurück]


Vorträge und Posterpräsentationen (mit Tagungsband-Eintrag):

T. Hassan:
"GraphWrap-A System for Interactive Wrapping of PDF Documents Using Graph Matching Techniques";
Poster: The 9th ACM Symposium on Document Engineering (DocEng'09), München, Deutschland; 15.09.2009 - 18.09.2009; in: "Proc. of the 2009 ACM Symposium on Document Engineering", U. M. Borghoff, B. Chidlovskii, S. Rönnau (Hrg.); (2009), ISBN: 978-1-60558-575-8; S. 247 - 248.



Kurzfassung englisch:
We present GraphWrap, a novel and innovative approach to wrapping PDF documents. The PDF format is often used to publish large amounts of structured data, such as product specifications, measurements, prices or contact information. As the PDF format is unstructured, it is very
difficult to use this data in machine processing applications.
Wrapping is the process of navigating the data source, semiautomatically extracting the data and transforming it into
a structured form. GraphWrap enables a non-expert user to create such data extraction programs for almost any PDF file in an intuitive
and interactive manner. We show how a wrapper can be created
by selecting an example instance and interacting with the graph representation to set conditions and choose which data items to extract. In the background, the corresponding instances are found using an algorithm based on subgraph isomorphism. The resulting wrapper can then be run on other pages and documents which exhibit a similar visual structure.


Zugeordnete Projekte:
Projektleitung Reinhard Pichler:
GraphWrap - Graph-Based Wrapping from PDF Documents


Erstellt aus der Publikationsdatenbank der Technischen Universität Wien.