LINGUIST List 17.1967

Thu Jul 06 2006

Software: Two New Corpora of Spoken and Written English

Editor for this issue: Svetlana Aksenova <svetlanalinguistlist.org>


Directory         1.    Christine Bowles, Two New Corpora of Spoken and Written English


Message 1: Two New Corpora of Spoken and Written English
Date: 06-Jul-2006
From: Christine Bowles <c.bowlesucl.ac.uk>
Subject: Two New Corpora of Spoken and Written English


The Survey of English Usage at UCL is pleased to announce the publication of two exciting new corpora supplied with search software that allows for the retrieval of grammatical patterns and constructions.

THE DIACHRONIC CORPUS OF PRESENT-DAY SPOKEN ENGLISH (DCPSE)

This corpus contains a total of 800,000 words of grammatically analysed (tagged and parsed) spontaneous spoken English from comparable categories in the London-Lund Corpus (1960s/1970s) and the ICE-GB Corpus (1990s): 400,000 words from each corpus in the form of tree diagrams. The design of DCPSE is such that it will be possible to study the grammatical features of spontaneous spoken English over time. DCPSE is the largest single collection of tagged and parsed orthographically transcribed spoken English in the world. The corpus will provide linguists interested in recent linguistic change in English with a new, innovative and searchable database. The corpus is suppplied on CD, together with the ICECUP 3.1 search software (see below) and a 'Getting Started' manual.

RELEASE 2 OF THE BRITISH COMPONENT OF THE INTERNATIONAL CORPUS OF ENGLISH (ICE-GB)

ICE-GB contains one million words of grammatically analysed (tagged and parsed) spoken and written present-day British English in the form of tree diagrams. The material in Release 2 of the corpus has been synchronised with sound recordings for the spoken part of the corpus (a total of around 75 hours), which can be supplied separately. Together with Release 2 of ICE-GB we are pleased to announce the publication of ICECUP 3.1, the dedicated search software for ICE-GB and DCPSE (see above). New features in ICECUP 3.1 include a lexicon and a grammaticon, which can provide an overview of distributions of words, tags, and grammatical patterns. The Fuzzy Tree Fragment (FTF) facility, which allows searches for grammatical patterns, has been extended and improved. There are many other improvements to ICECUP in this release, e.g. a thoroughly revised on-line help manual covering all the new features. A new ICECUP 'Getting Started' manual is published with the corpus.

ICE-GB SOUND RECORDINGS

The sound recordings (75 hours) will be available in the form of a set of CDs containing uncompressed 'wave' files for installation on a hard disk.

For further details, including prices and upgrades, please visit:

http://www.ucl.ac.uk/english-usage/resources/sales.htm

or contact Christine Bowles: c.bowlesucl.ac.uk

We offer very low prices for students. Please allow 4-6 weeks for delivery.

Linguistic Field(s): Historical Linguistics Syntax Text/Corpus Linguistics

Subject Language(s): English (eng)