LINGUIST List 15.1670

Thu May 27 2004

Diss: Semantics/Syntax: Sofronie: 'Categorial...'

Editor for this issue: Takako Matsui <takolinguistlist.org>


Directory

  • dsofronie, Categorial Grammars acquisition to simulate natural language learning...

    Message 1: Categorial Grammars acquisition to simulate natural language learning...

    Date: Thu, 27 May 2004 12:46:15 -0400 (EDT)
    From: dsofronie <dsofroniefree.fr>
    Subject: Categorial Grammars acquisition to simulate natural language learning...


    Institution: University of Lille, France Program: PhD Dissertation Status: Completed Degree Date: 2004

    Author: Daniela Sofronie

    Dissertation Title: Categorial Grammars acquisition to simulate natural language learning with semantic help

    Linguistic Field: Computational Linguistics, Semantics, Syntax, Text/Corpus Linguistics, Language Acquisition

    Subject Language: French (code: FRN)

    Dissertation Director 1: Remi Gilleron Dissertation Director 2: Isabelle Tellier Dissertation Director 3: Marc Tommasi

    Dissertation Abstract:

    Natural language acquisition is still a challenge for modern research, more especially as this task requires a multi-field approach, including cognitive sciences, linguistics and data processing. This thesis treats a under-part of this vast field, the acquisition of the syntax of a language using the semantics, formalized like a process of grammatical inference. The theory of the formal languages, the logic and the formal learning theory contribute there by offering three formal models: categorial grammars to represent syntax, the logic of Montague from which a simplified semantics is extracted and the model of identification in the limit, from positive examples, of Gold, like support of the process of inference. The choice of these models results from an exploration of the psycholinguistics and cognitive studies on the childish acquisition which support the following assumptions: acquisition only takes place in the presence of positive examples; there exists some knowledge of semantic nature which is innate or which can be extracted directly from the environment. Our research concentrated on the class of AB or classical categorial grammars which gave place these last years to some interesting learnability results within the model of Gold (mainly dues to Kanazawa). This class deserves to be studied because its members allow to generate the whole context-free languages and because the interface which it allows with a semantic interpretation makes it able to model certain characteristics of the natural languages. But the known results of learnability relate only some subclasses (the class of rigid grammars) or give place to crippling algorithms (classes of grammars k-valued with k > 1). We define a new subclass of classical categorial grammars in the same time interesting from a language-theoretic point of view (since its members allow to generate the whole structured languages of classical categorial grammars) and from the point of view of machine learning (since it is learnable in Gold's model if adapted data are provided). To test the validity and the effectiveness of our proposal we constituted a corpus of French texts with semantic annotations. The results of the experiments are promising, especially with regard to the influence of certain factors like the order of the sentences (from the shortest to the longest) and the redundancy of the vocabulary, which proves to be beneficial, confirming the assumptions.