[Back]


Contributions to Proceedings:

M. Lupu:
"Bootstraping a Comparable Corpus from Patent Family Members";
in: "9th International Workshop on Text-based Information Retrieval", 9th International Workshop on Text-based Information Retrieval, 2012.



English abstract:
Abstract-We present a method to generate comparable
corpora from different patent documents covering the same
invention. We rely on the fact that many inventors apply for
protection in more than one jurisdictions. Often, these jurisdictions have different publication languages, and therefore, the
same invention is described in more than one language. We
use this fact to generate comparable corpora in any language
pair where patent documents are available. We do this at the
level of the title, abstract, description and claims and present
statistics for English-Spanish data thus generated. We then
show that with an additional filtering step we can reduce the
errors inserted in the collection by the automated procedure.

Keywords:
patent, translation, corpora


Electronic version of the publication:
http://publik.tuwien.ac.at/files/PubDat_213939.pdf


Created from the Publication Database of the Vienna University of Technology.