LINGUIST List 26.4705: FYI: SemEval-2016 Task 3: Community Question Answering

LINGUIST List 26.4705

Thu Oct 22 2015

FYI: SemEval-2016 Task 3: Community Question Answering

Editor for this issue: Ashley Parker <ashleylinguistlist.org>

Date: 22-Oct-2015
From: Preslav Nakov <preslav.nakovgmail.com>
Subject: SemEval-2016 Task 3: Community Question Answering
E-mail this message to a friend

Call for Participation:

SemEval-2016 Task 3: Community Question Answering (cQA)
Website: http://alt.qcri.org/semeval2016/task3

Please check the updated description of the (sub)tasks

Google Group: https://groups.google.com/forum/#!forum/semeval-cqa

- Evaluation period: January 10-31, 2016
- Paper submission: February 28, 2016

Register to participate at http://goo.gl/forms/cGkRocFFph

Summary:

- Task: Automatic Selection of Questions and Answers for cQA Ecosystems
- Given (i) a new question and (ii) a large collection of previously-asked questions and comment threads, created by a user community, rank the comments from these threads in order of usefulness for answering the new input question (i).

Main features:
- Related to a real-world application scenario:
-- the data comes from a real application, Qatar Living forum, and
-- the systems solving the task would provide technology immediately usable in various cQA applications
- The first (to the best of our knowledge) large annotated corpus for question--question similarity defined by a cQA application

- Opportunity to model and explore a variety of research directions:
-- building systems for the individual components of cQA, which can then be merged in a more complex model to solve the overall task
-- designing question similarity components (typically needed in cQA) for improving answer selection (required by traditional QA)
-- modeling the interaction among the answers in threads for improving answer selection
-- modeling the interaction among threads using question--question similarity
- Related to textual inference as in traditional QA, textual entailment, semantic similarity, but in the more challenging environment of social media text

- Multilinguality: offered in Arabic and English

- Multiple domains:
-- typical questions asked when coming to a new country;
-- real-life questions from the medical domain.
- Designed to allow systems with no IR component to participate

Target: We aim at exploring semantically-oriented solutions using rich language representations to see whether they can improve over simpler bag-of-words and word matching techniques.

Subtasks:

- Subtask A (English): Question-Comment Similarity
Given a question and the first 10 comments in its question thread, rerank these 10 comments according to their relevance with respect to the question.
- Subtask B (English): Question-Question Similarity
Given a new question (aka original question) and the set of the first 10 related questions (retrieved by a search engine), rerank the related questions according to their similarity with the original question.
- Subtask C (English): Question-External Comment Similarity --this is the main English subtask.
Given a new question (aka the original question) and the set of the first 10 related questions (retrieved by a search engine), each associated with its first 10 comments appearing in its thread, rerank the 100 comments (10 questions x 10 comments) according to their relevance with respect to the original question.
- Subtask D (Arabic): Rerank the correct answers for a new question.

Given a new question (aka the original question) and the set of the first 30 related questions (retrieved by a search engine), each associated with one correct answer (which typically have a size of one or two paragraphs), rerank the 30 question-answer pairs according to their relevance with respect to the original question.

More Information:

- For a longer introduction please refer to:
http://alt.qcri.org/semeval2016/task3
- For a precise definition of all subtasks and their associated evaluation see the Task Description page:
http://alt.qcri.org/semeval2016/task3/index.php?id=description-of-tasks
- The corpora and the tools can be downloaded from the Data and Tools page:
http://alt.qcri.org/semeval2016/task3/index.php?id=data-and-tools
- Register to participate here:
http://goo.gl/forms/cGkRocFFph

Finally, do not miss the important dates (the evaluation period is January 10-31, 2016):

- Evaluation starts: January 10, 2016
- Evaluation ends: January 31, 2016
- Paper submission due: February 28, 2016 [TBC]
- Paper notification: early April, 2016 [TBC]
- Camera ready due: April 30, 2016 [TBC]
- SemEval workshop: Summer 2016

For questions and doubts please check out our Google Group: semeval-cqagooglegroups.com

Organizers:

Preslav Nakov, Qatar Computing Research Institute, HBKU
Lluís Màrquez, Qatar Computing Research Institute, HBKU
Alessandro Moschitti, Qatar Computing Research Institute, HBKU
Walid Magdy, Qatar Computing Research Institute, HBKU
James Glass, CSAIL-MIT
Bilal Randeree, Qatar Living

Acknowledgements:

- We would like to thank Hamdy Mubarak and Abdelhakim Freihat from QCRI who have contributed a lot to the data preparation.
- This research is developed by the Arabic Language Technologies (ALT) group at Qatar Computing Research Institute (QCRI), HBKU, within the Qatar Foundation in collaboration with MIT. It is part of the Interactive sYstems for Answer Search (Iyas) project.

Linguistic Field(s): Computational Linguistics

Page Updated: 22-Oct-2015