[Zurück]


Vorträge und Posterpräsentationen (mit Tagungsband-Eintrag):

T. Hassan:
"User-Guided Wrapping of PDF Documents Using Graph Matching Techniques";
Vortrag: 10th Int. Conference on Document Analysis and Recognition, Barcelona, Spanien; 26.07.2009 - 29.07.2009; in: "Proc. of the 10th Int. Conf. on Document Analysis and Recognition", A. Apostolos, M. Cheriet, U. Pal (Hrg.); (2009), ISBN: 978-0-7695-3725-2; S. 631 - 635.



Kurzfassung englisch:
There are a number of established products on the market for wrapping-semi-automatic navigation and extraction of data-from web pages. These solutions make use of the inherent structure of HTML to locate instances of data to be wrapped. As PDF documents do not have such a structure, wrapping PDF documents has long been recognized as
a challenging problem. We have developed a novel system for wrapping PDF documents, which is currently at a prototype stage. A PDF
document is represented as an attributed relational graph, in which nodes represent physical items on the page and edges represent spatial and logical relationships. A wrapper is defined as a subgraph of the document with additional conditions, and can quickly and intuitively be created by a non-expert using the GUI. An algorithm based on subgraph isomorphism is then used to find the data instances
and extract the required data. Experiments show that our approach achieves good results with good execution time.


Zugeordnete Projekte:
Projektleitung Reinhard Pichler:
GraphWrap - Graph-Based Wrapping from PDF Documents


Erstellt aus der Publikationsdatenbank der Technischen Universität Wien.