Bilingual term pairs extracted from Wikipedia using the TaaS Bilingual Term Extraction System
The resource contains bilingual term pairs automatically extracted from Wikipedia using the TaaS Bilingual Term Extraction System. The workflow for bilingual term extraction consisted of: 1) Wikipedia Retrieval tool for acquisition and extraction of plaintext documents and cross-lingual document links from Wikipedia data dumps. 2) TaaS Domain Classifier tool developed by University of Sheffield for domain classification of Wikipedia documents. 3) Tilde's Wrapper System for CollTerm for identification of terms in plaintext documents. 4) Term normalisation tools developed by Tilde and University of Sheffield for acquisition of term normalised (canonical) forms from terms in different surface forms. 5) MPAligner in order to extract bilingual term pairs (align terms) from term tagged Wikipedia document pairs. 6) TaaS Domain classifier for term pairs that classifies term pairs belonging to particular TaaS domains using domain classification information and term presence in different documents.