ISLE Speech Corpus
View resource name in all available languages
Corpus de parole ISLE
ID:
ELRA-S0083
Approx. 20 minutes of speech (per speaker) from 23 German and 23 Italian intermediate learners of English. Each speaker recorded sentences from several blocks of differing types (reading simple sentences, using minimal pairs, giving answers to multiple choice questions). The prompts were of varying perplexities.
About 2/3 of the data for each speaker was annotated by one of a team of linguists. The files were corrected first at the word level, and an automatic recognizer was then used to produce phone-level annotations. The annotator then re-annotated each sentence to mark phone and stress errors (e.g., substitutions, insertions, or deletions).
Corpus details:
* a total of 46 speakers (23 German and 23 Italian.)
* 11484 utterances
* 1.92 gigabytes of WAV files (4 CDs)
* 17 hours, 54 minutes, and 44 seconds of speech data
A much more detailed explanation of the ISLE corpus will be available in the proceedings of LREC 2000. An electronic copy of this paper may be obtained by sending an email to Dr. Wolfgang Menzel at <menzel@nats.informatik.uni-hamburg.de>.
W. Menzel, E. Atwell, P. Bonaventura, D. Herron, P. Howarth, R. Morton, and C. Souter. "The ISLE corpus of non-native spoken English", Proc. Second LREC.
View resource description in all available languages