LINGUIST List 18.3424

Sat Nov 17 2007

Software: ELRA Language Resources Catalogue 11/07-2

Editor for this issue: Hannah Morales <hannahlinguistlist.org>


Directory         1.    Hélène Mazo, ELRA Language Resources Catalogue 11/07-2


Message 1: ELRA Language Resources Catalogue 11/07-2
Date: 16-Nov-2007
From: Hélène Mazo <mazoelda.org>
Subject: ELRA Language Resources Catalogue 11/07-2
E-mail this message to a friend

ELRA is happy to announce that 6 new Speech Resources from the TC-STARproject are now available in its catalogue.

ELRA-S0249 TC-STAR English Training Corpora for ASR: Transcriptions of EPPSSpeech:This corpus consists of transcriptions from 92 hours of EPPS (EuropeanParliament Plenary Sessions) speeches held or interpreted in EuropeanEnglish (a mixture of native and non-native English). The transcriptionfiles are stored in Transcriber XML file format. For correspondingrecordings, see ELRA-S0251. For more information, see:http://catalog.elra.info/product_info.php?products_id=1032

ELRA-S0250 TC-STAR English-Spanish Training Corpora for MachineTranslation: Aligned Final Text Editions of EPPS:This corpus consists of respectively 34 million (English) and 38 million(Spanish) running words of bilingual sentence segmented and aligned textsin English and Spanish obtained from the Final Text Editions provided bythe European Parliament (from April 1996 to Sept. 2004, Dec. 2004 to May2005, and Dec. 2005 to May 2006. The data is accompanied by tools forfurther preprocessing. For more information, see:http://catalog.elra.info/product_info.php?products_id=1033

ELRA-S0251 TC-STAR English Training Corpora for ASR: Recordings of EPPSSpeech:This corpus consists of the recordings of around 290 hours form EPPS(European Parliament Plenary Sessions) speeches held or interpreted inEuropean English, 92 hours of which were annotated (transcribed) (thetranscriptions are not provided in the present package). Each file containsa single channel with 16-bit resolution at a sample rate of 16kHz. Forcorresponding transcriptions, see ELRA-S0249. For more information, see:http://catalog.elra.info/product_info.php?products_id=1034

ELRA-S0252 TC-STAR Spanish Training Corpora for ASR: Recordings of EPPSSpeech:This corpus consists of the recordings of around 283 hours from EPPS(European Parliament Plenary Sessions) speeches held or interpreted inEuropean Spanish (a mixture of native and non-native Spanish). Each filecontains a single channel with 16-bit resolution at a sample rate of 16kHz.For more information, see:http://catalog.elra.info/product_info.php?products_id=1036

ELRA-S0253 TC-STAR English Test Corpora for ASR:This corpus consists of 70 hours of recordings of EPPS (European ParliamentPlenary Sessions) speeches held or interpreted in European English andother European languages. From this corpus, 16 hours of English speeches(native or non native) were annotated (transcribed). Each speech filecontains a single channel with 16-bit resolution at a sample rate of 16kHz.The transcription files are stored in Transcriber XML file format. For moreinformation, see:http://catalog.elra.info/product_info.php?products_id=1037

ELRA-S0254 TC-STAR Spanish Test Corpora for ASR:This corpus consists of 174 hours of recordings of EPPS (EuropeanParliament Plenary Sessions) speeches held or interpreted in EuropeanSpanish and other European languages. From this corpus, 16 hours of Spanishspeeches were annotated (transcribed). Each audio file contains a singlechannel with 16-bit resolution at a sample rate of 16kHz. The transcriptionfiles are stored in Transcriber XML file format. For more information, see:http://catalog.elra.info/product_info.php?products_id=1038

For more information on the catalogue, please contact Valérie Mapellimailto:mapellielda.org

Visit our on-line catalogue: http://catalog.elra.info.

Linguistic Field(s): Computational Linguistics                             Text/Corpus Linguistics