LINGUIST List 29.4711

Wed Nov 28 2018

Software: Tools and Resources for Written and Oral French / Outils et Ressources pour le Français Ecrit et Oral

Editor for this issue: Everett Green <everettlinguistlist.org>



Date: 21-Nov-2018
From: Jeanne-Marie Debaisieux <jeanne-marie.debaisieuxSorbonne-Nouvelle.fr>
Subject: Tools and Resources for Written and Oral French / Outils et Ressources pour le Français Ecrit et Oral
E-mail this message to a friend

Orfeo is a portal which gives access to the Corpus for the Study of Contemporary French: (CEFC). The corpus consists of 10 M. words:

- 4 million words from spoken French transcriptions of about 350 hours of recordings, collected in France, Switzerland and Belgium and in different diaphasic situations (face-to-face conversations; interviews, debates, and classroom interactions; lectures, sermons, and speeches, as well as radio and television programs).

- 6 million words of written texts from a wide range of genres (e.g. literature, scientific texts, regional and national press, essays, academic, non-standard writings).


- CEFC is freely available on the portal :

https://www.ortolang.fr/market/corpora/cefc-orfeo

- The portal gives access to the acoustic files and textual resources. The corpus is searchable for textual and register variables available from the metadata, as well as for lexical and morpho-syntactic (POS) annotations. The entire corpus is semi-automatically annotated with syntactic dependencies. The search tool can return dependencies patterns. All the queries return orthographic transcriptions aligned with audio files. Guides are provided for all types of annotations. All files: texts, sounds and annotations are freely downloadable.

Linguistic Field(s): Text/Corpus Linguistics

Subject Language(s): French (fra)


Page Updated: 28-Nov-2018