[Zurück]


Beiträge in Tagungsbänden:

Boutros El-Gamil, W. Winiwarter:
"Accelerating Structured Web Crawling without Losing Data";
in: "Proceedings of the International Conference on Information Integration and Web-based Applications", ACM, 2013, ISBN: 978-1-4503-2113-6, 5 S.



Kurzfassung englisch:
Size of retrieved data versus crawling time formulate a well-
known dilemma in the structured Web crawling community.
The real challenge within this dilemma is to optimize the
settings of a given wrapper to obtain maximum available
data in shortest possible time. In this paper, we try to
tune these settings, by introducing a threaded algorithm
that guarantees accessing all available detail pages within
crawling scope; and using this algorithm, we try to reduce
the time consumed by the crawler, via simple adjustments
of sleeping time after each detail page visit.

Schlagworte:
Structured Web Crawling, Web Wrappers, Online Databases


Elektronische Version der Publikation:
http://publik.tuwien.ac.at/files/PubDat_223850.pdf


Erstellt aus der Publikationsdatenbank der Technischen Universität Wien.