Bilingual term pairs extracted from comparable Web resources using the TaaS Bilingual Term Extraction System
The resource contains bilingual term pairs automatically extracted from comparable resources found in the Web using the TaaS Bilingual Term Extraction System. The workflow for bilingual term extraction consisted of: 1) Focussed Monolingual Crawler for comparable corpora collection from the Web and for plaintext extraction. 2) DictMetric for cross-lingual document level alignment of the collected comparable corpora. 3) Tilde's Wrapper System for CollTerm for identification of terms in plaintext documents. 4) Term normalisation tools developed by Tilde and University of Sheffield for acquisition of term normalised (canonical) forms from terms in different surface forms. 5) MPAligner in order to extract bilingual term pairs (align terms) from term tagged Wikipedia document pairs.