LECTRA (LECture TRAnscriptions in European Portuguese)
View resource name in all available languages
Corpus LECTRA (LECture TRAnscriptions in European Portuguese)
This corpus is composed of the audio and the manual transcriptions of the LECTRA Corpus: classroom LECture TRAnscriptions in European Portuguese. The corpus includes seven 1-semester University courses. All lectures were taught at Technical University of Lisbon (IST), recorded in the presence of students, except IICT, recorded in another university and in a quiet office environment, targeting an Internet audience. The corpus contains a total of 28 hours of audio speech that were manually transcribed by several trained annotators.
The corpus is comprised of technical University lectures: Production of Multimedia Contents (PMC), Economic Theory I (ETI), Linear Algebra (LA), Introduction to Informatics and Communication Techniques (IICT), Object Oriented Programming (OOP), Accounting (CONT), Graphical Interfaces (GI).
Two files per lecture are provided:
a) a RAW file: audio file
b) a TRS file: containing the manual transcriptions. The TRS format is a kind of XML format that a standard transcription software such as Transcriber can open. Annotations in the TRS files are at word-level. They are fine-grained transcriptions that include disfluencies. The characters in the text files are encoded in ISO-8859-1 (Latin1).
The TRS files have a total of 220K word tokens (Training set: 179K word tokens, Development set: 21K word tokens, Test set: 20K word tokens). The whole resource occupies 3.3 GB.
For a complete description of the corpus and the report of Automatic Speech Recognition results, the reader may refer to:
(Trancoso et al., 2008) Isabel Trancoso, Rui Martins, Helena Moniz, Ana Isabel Mata da Silva, Maria do Céu Guerreiro Viana Ribeiro, The LECTRA Corpus - Classroom Lecture Transcriptions in European Portuguese, In
LREC 2008 - Language Resources and Evaluation Conference, Marrakesh, Morocco, May 2008.
(Pellegrini et al., 2012) Thomas Pellegrini, Helena Moniz, Fernando Batista, Isabel Trancoso, Ramon Fernandez Astudillo, Extension of the LECTRA corpus: classroom LECture TRAnscriptions in European Portuguese, In SPEECH AND CORPORA, Belo Horizonte, March 2012.
View resource description in all available languages