LINGUIST List 29.2267

Fri May 25 2018

Software: Database for Spoken German (DGD) v2.10

Editor for this issue: Kenneth Steimel <kenlinguistlist.org>


Date: 25-May-2018
From: Thomas Schmidt <thomas.schmidtids-mannheim.de>
Subject: Database for Spoken German (DGD) v2.10
E-mail this message to a friend

This week, we released version 2.10 of the Database for Spoken German (Datenbank für Gesprochenes Deutsch, DGD) at:

https://dgd.ids-mannheim.de

After a one time registration, the DGD is free to use for research and teaching. The DGD provides access to oral corpora from the Archive for Spoken German (http://agd.ids-mannheim.de). Among the resources available in the DGD are:

- FOLK, the Research and Teaching Corpus of Spoken German - a 230h (2.2 million tokens) collection of audio and video recordings of authentic interaction in private, institutional and public settings. All data have been transcribed according to the GAT convention, aligned with the recordings and annotated with an orthographic normalisation, lemmatisation and POS tagging according to STTS. For more information on FOLK, please see http://agd.ids-mannheim.de/folk.shtml

- GeWiss (''Gesprochene Wissenschaftssprache Kontrastiv'') - a 1 million tokens corpus of spoken academic language (exam talks, student and expert presentations) collected by the GeWiss project in Leipzig, Wroclaw and Birmingham

- Deutsche Mundarten (German dialects, ''Zwirner-Korpus'') - the largest corpus documenting German dialects, and a series of other dialect corpora following a similar design

- Emigrantendeutsch in Israel (Emigrant German in Israel) - three collections of biographic interviews with German speaking emigrants in Israel, collected in various projects by Anne Betten

- Monash Corpus of Australian German - a corpus collected by Michael Clyne documenting language use of the German speaking community in South Australia

The new version includes an extension of FOLK and two new corpora: RUDI (''Russlanddeutsche Dialekte'') with recordings of German speakers from the former Soviet Union and BETV (''Belgische TV-Debatten'') with videos from TV debates from the German speaking part of Belgium.

The DGD can be used to browse these data, to do systematic queries on metadata and transcripts, and to download excerpts from the corpora.

Linguistic Field(s): Applied Linguistics
                            General Linguistics
                            Language Documentation
                            Pragmatics
                            Sociolinguistics
                            Text/Corpus Linguistics

Subject Language(s): German (deu)

Page Updated: 25-May-2018