Strongly Comparable and Aligned Legal News EN-FR-RO News Corpus
ID: RACAIEnFrRoNewsCorpus This corpus is a collection of strongly comparable English, French and Romanian documents collected from http://ec.europa.eu/ website that are sentence split, POS tagged, lemmatized and chunked and that are also sentence aligned using Moore's sentence aligner (http://research.microsoft.com/pubs/68886/sent-align2-amta-final.pdf). Distribution Availability
Available - Restricted Use
Licence MS Commons - BY - NC - ND
Restrictions: Academic - Non Commercial Use, Attribution, Inform Licensor, No Redistribution
User Nature: Academic, Commercial
Attribution Details: Please cite this paper: "Radu Ion, Dan Tufiş, Tiberiu Boroş, Alexandru Ceauşu, and Dan Ştefănescu. On-Line Compilation of Comparable Corpora and their Evaluation. In Marko Tadić, Mila Dimitrova-Vulchanova, and Svetla Koeva (eds.), Proceedings of The 7th International Conference Formal Approaches to South Slavic and Balkan Languages (FASSBL-7), pp. 29—34, Croatian Language Technologies Society – Faculty of Humanities and Social Sciences, Zagreb, Croatia, October 2010. ISBN: 978-953-55375-2-6."
Distribution Access/Medium: Accessible Through Interface, Downloadable
Distribution rights holders:
IPR Holder
Contact Person
Multilingual text corpus Languages
French
(1,809 Files)
English
(1,848 Files)
Romanian
(966 Files)
Linguality Linguality type: Multilingual
Multi-linguality type: Comparable (A parallel sub-corpus is extracted.)
Text Format Size Character encoding
UTF - 8
Domains
Legal news
Conforms to Other
Modalities Time Coverage
Years 2010-2011
Creation Creation mode: Automatic
Resource Creation Metadata Created: 12/18/2012
Last Updated: 12/18/2012
Source: METANET4U
3.0
Metadata Language:
English
(en)
Validation Validated Type of Validation: Formal
Validation Mode: Automatic
Mode Details: Parsed for XCES conformance.
Extent: Full
Documentation Tool Documentation: Manual
Document Type: Manual
Keywords: parallel corpus, English, Romanian, French, POS-tagged, lemmatized, chunked
Document Language:
English
People who looked at this resource also viewed the following: