EUROPARL Corpus Parallel Corpora: Portuguese-English

4 Last view: 2024-12-20

EUROPARL Corpus Parallel Corpora: Portuguese-English

View resource name in all available languages

Corpus parallèle portugais-anglais du corpus EUROPARL

http://catalog.elra.info/product_info.php?products_id=1257

ID:

ELRA-W0090

The EUROPARL Corpus (Portuguese-English subpart of the parallel corpora), was extracted from the proceedings of the European Parliament. It contains transcriptions of sessions dating back from 1996 to 2011, with a total of approximately 58,324,562 tokens of European Portuguese (L1) and 49,216,896 tokens of English (translation).

The EUROPARL Corpus is composed of one text file for the English corpus and two files for the Portuguese version: a text file and an annotated file. The text version contains plain text and no further annotation. The Portuguese annotated file is a four-column file with one token per line, followed by a PoS tag and a lemma. The corpus was automatically PoS-tagged with MBT tagger (http://ilk.uvt.nl/mbt/), and lemmatized with MBLEM (http://ilk.uvt.nl/mbma/), following the annotation scheme of the Corpus of Reference of Contemporary Portuguese.

View resource description in all available languages

Le corpus EUROPARL (la sous-partie portugais-anglais du corpus), est extrait des actes du Parlement Européen. Il contient des transcriptions des séances parlementaires correspondant à la période 1996-2011, et comprend environ 58,324,562 mots en portugais européen (L1) et 49,216,896 mots en anglais (traduction).

Le corpus EUROPARL comprend un fichier texte pour la sous-partie en anglais, et deux fichiers pour la version portugaise: un fichier texte et un fichier annoté. Le fichier texte contient du texte simple sans annotation. Le fichier annoté contient un mot par ligne, suivi par la partie du discours et par le lemme. Le corpus a été annoté automatiquement en parties du discours avec l’étiqueteur MBT (http://ilk.uvt.nl/mbt/), et il a été lemmatisé avec l’outil MBLEM (http://ilk.uvt.nl/mbma/), selon le schéma d’annotation du Corpus de Référence du Portugais Contemporain.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Start date: 01/20/2016

Licence

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

Fee: 0.00

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

Fee: 0.00

User Nature: Academic

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

Fee: 0.00

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

Fee: 0.00

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

Fee: 0.00

User Nature: Commercial

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

Fee: 0.00

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

Fee: 0.00

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

Fee: 0.00

User Nature: Commercial

Contact Person

Mapelli Valérie

text

Monolingual text corpusLanguages

English Portuguese

Linguality

Linguality type: Monolingual

Multi-linguality type: Parallel

Size

no size available

Metadata

Created: 05/12/2005

Version

Version: 1.0

Last Updated: 01/20/2016

People who looked at this resource also viewed the following: