We Have a New Site!
With the help of your donations we have been making good progress on designing and launching our new website! Check it out at https://linguistlist.org/!***We are still in our beta stages for the new site--if you have any feedback, be sure to let us know at webdevlinguistlist.org***
Academic Paper |
|
|
|
|
| Title: | Extraction of multi-word expressions from small parallel corpora |
| Author: | Yulia Tsvetkov |
| Institution: | Language Technologies Institute Carnegie Mellon University |
| Author: | Shuly Wintner |
| Institution: | University of Haifa |
| Linguistic Field: | Computational Linguistics; Text/Corpus Linguistics |
| Abstract: | We present a general, novel methodology for extracting multi-word expressions (MWEs) of various types, along with their translations, from small, word-aligned parallel corpora. Unlike existing approaches, we focus on misalignments; these typically indicate expressions in the source language that are translated to the target in a non-compositional way. We introduce a simple algorithm that proposes MWE candidates based on such misalignments, relying on 1:1 alignments as anchors that delimit the search space. We use a large monolingual corpus to rank and filter these candidates. Evaluation of the quality of the extraction algorithm reveals significant improvements over naïve alignment-based methods. The extracted MWEs, with their translations, are used in the training of a statistical machine translation system, showing a small but significant improvement in its performance. |
|
|
|
|
This article appears IN Natural Language Engineering Vol. 18, Issue 4. |
|
Add a new paper
Return to Academic Papers main page Return to Directory of Linguists main page |
|


