LINGUIST List 19.3048

Wed Oct 08 2008

Software: Computational Ling/Text&Corpus Ling/Software for automatic text..

Editor for this issue: Susanne Vejdemo <>

        1.    Slava Yatsko, Software for automatic text processing

Message 1: Software for automatic text processing
Date: 08-Oct-2008
From: Slava Yatsko <>
Subject: Software for automatic text processing
E-mail this message to a friend

Dear Colleagues,The Computational Linguistics Laboratory at Katanov State University ofKhakasia (CLL at KSU) is pleased to announce the release of LinguisticToolbox – a package of programs for automatic text processing. LinguisticToolbox is a concordance that differs from existing analogues in thefollowing respects. - It has an integrated part-of-speech tagger thus allowing the user tocreate his/her own annotated corpora. Profound linguistic research is oftenbased on a specific text genre (e.g. fiction, scientific text), linguisticcategory (e.g. possession), or works of a particular author (e.g. Maugham).Publicly available annotated national corpora with evenly distributedgenres often fail to meet the demands of such research and LIT has beendesigned to fill this gap. By means of LIT the user can conduct varioussearches on his/her own corpora and get statistical information ondistribution of various words, patterns, and phrases.- Union, subtraction, and intersection operations. These operations areused in the theory of sets to construct new sets from existing ones. Whynot perform these operations on texts, so that to construct new texts fromexisting ones? For example using the subtraction operation the user cansubtract stopwords from a text, and using the intersection operation he/shecan get a list of words that occur in two or more texts with raw countsassigned to each word. These functions may be of use for computingdistances between texts for the purposes of text classification andcategorization.- LIT has an integrated spreadsheet. Having obtained by means of LIT somestatistical information the user can perform computations in LIT itselfwithout consulting some commercially distributed products such as MS Excel.- LIT has an integrated WordNet module by means of which the user cansearch not only for a given word but also for words semantically related to it.

LIT is distributed as freeware and can be downloaded from the CLL's site at current version supports English and works on Windows machines.

V.Yatsko, Head of the CLL at KSU

Linguistic Field(s): Computational Linguistics                             Text/Corpus Linguistics