LINGUIST List 29.2267
Fri May 25 2018
Software: Database for Spoken German (DGD) v2.10
Editor for this issue: Kenneth Steimel <kenlinguistlist.org>
Date: 25-May-2018
From: Thomas Schmidt <thomas.schmidt
ids-mannheim.de>
Subject: Database for Spoken German (DGD) v2.10
E-mail this message to a friend This week, we released version 2.10 of the Database for Spoken German (Datenbank für Gesprochenes Deutsch, DGD) at:
https://dgd.ids-mannheim.de After a one time registration, the DGD is free to use for research and teaching. The DGD provides access to oral corpora from the Archive for Spoken German (
http://agd.ids-mannheim.de). Among the resources available in the DGD are:
- FOLK, the Research and Teaching Corpus of Spoken German - a 230h (2.2 million tokens) collection of audio and video recordings of authentic interaction in private, institutional and public settings. All data have been transcribed according to the GAT convention, aligned with the recordings and annotated with an orthographic normalisation, lemmatisation and POS tagging according to STTS. For more information on FOLK, please see
http://agd.ids-mannheim.de/folk.shtml - GeWiss (''Gesprochene Wissenschaftssprache Kontrastiv'') - a 1 million tokens corpus of spoken academic language (exam talks, student and expert presentations) collected by the GeWiss project in Leipzig, Wroclaw and Birmingham
- Deutsche Mundarten (German dialects, ''Zwirner-Korpus'') - the largest corpus documenting German dialects, and a series of other dialect corpora following a similar design
- Emigrantendeutsch in Israel (Emigrant German in Israel) - three collections of biographic interviews with German speaking emigrants in Israel, collected in various projects by Anne Betten
- Monash Corpus of Australian German - a corpus collected by Michael Clyne documenting language use of the German speaking community in South Australia
The new version includes an extension of FOLK and two new corpora: RUDI (''Russlanddeutsche Dialekte'') with recordings of German speakers from the former Soviet Union and BETV (''Belgische TV-Debatten'') with videos from TV debates from the German speaking part of Belgium.
The DGD can be used to browse these data, to do systematic queries on metadata and transcripts, and to download excerpts from the corpora.
Linguistic Field(s): Applied Linguistics
General Linguistics
Language Documentation
Pragmatics
Sociolinguistics
Text/Corpus Linguistics
Subject Language(s):
German (deu)
Page Updated: 25-May-2018