Strongly Comparable and Aligned Legal News EN-FR-RO News Corpus
ID: RACAIEnFrRoNewsCorpus This corpus is a collection of strongly comparable English, French and Romanian documents collected from website that are sentence split, POS tagged, lemmatized and chunked and that are also sentence aligned using Moore's sentence aligner ( Distribution Availability
Available - Restricted Use
Licence MS Commons - BY - NC - ND
Restrictions: Academic - Non Commercial Use, Attribution, Inform Licensor, No Redistribution
User Nature: Academic, Commercial
Attribution Details: Please cite this paper: "Radu Ion, Dan Tufiş, Tiberiu Boroş, Alexandru Ceauşu, and Dan Ştefănescu. On-Line Compilation of Comparable Corpora and their Evaluation. In Marko Tadić, Mila Dimitrova-Vulchanova, and Svetla Koeva (eds.), Proceedings of The 7th International Conference Formal Approaches to South Slavic and Balkan Languages (FASSBL-7), pp. 29—34, Croatian Language Technologies Society – Faculty of Humanities and Social Sciences, Zagreb, Croatia, October 2010. ISBN: 978-953-55375-2-6."
Distribution Access/Medium: Accessible Through Interface, Downloadable
Distribution rights holders:
IPR Holder
Contact Person
Multilingual text corpus Languages
(1,809 Files)
(1,848 Files)
(966 Files)
Linguality Linguality type: Multilingual
Multi-linguality type: Comparable (A parallel sub-corpus is extracted.)
Text Format Size Character encoding
UTF - 8
Legal news
Conforms to Other
Modalities Time Coverage
Years 2010-2011
Creation Creation mode: Automatic
Resource Creation Metadata Created: 12/18/2012
Last Updated: 12/18/2012
Metadata Language:
Validation Validated Type of Validation: Formal
Validation Mode: Automatic
Mode Details: Parsed for XCES conformance.
Extent: Full
Documentation Tool Documentation: Manual
Document Type: Manual
Keywords: parallel corpus, English, Romanian, French, POS-tagged, lemmatized, chunked
Document Language:
People who looked at this resource also viewed the following: