LINGUIST List 31.1757
Wed May 27 2020
Media: Enhanced Large Scale Colloquial Persian Language Understanding (LSCP) Corpus
Editor for this issue: Everett Green <everettlinguistlist.org>
Date: 22-May-2020
From: Hadi Abdi Khojasteh <hadiabdikhojasteh
gmail.com>
Subject: Enhanced Large Scale Colloquial Persian Language Understanding (LSCP) Corpus
E-mail this message to a friend I am thrilled to announce our new study on informal language understanding which will be announced in LREC 2020.
This is the first public contribution of our effort for informal spoken Persian (Farsi) language understanding and multilingual corpus for the low-resourced aspect of spoken language. The language in its oral form is typically much more dynamic than its written form. The written variety of a language typically involves a higher level of ritual, whereas the spoken form is characterised by several contractions and abbreviations. In formal written texts, longer and tougher sentences tend to be used as the reader can re-read the troublesome parts if they lose track.
More information can be found at
https://iasbs.ac.ir/~ansari/lscp/ and the corpus is available in the LINDAT/CLARIN-CZ repository via
http://hdl.handle.net/11234/1-3195. LSCP has approx. 120M sentences from 27M casual Persian tweets with its dependency relations in syntactic annotation, part-of-speech tags, sentiment polarity and translations in English, German, Czech, Italian and Hindi spoken languages.
Linguistic Field(s): Computational Linguistics
Subject Language(s):
Persian, Iranian (pes) Language Family(ies): Iranian
Page Updated: 27-May-2020