Research Projects

Recent and active funded projects

Here

Above

ANNODIS (Discourse Annotation: tools and reference corpus for French)

(Projet ANR 2007-2009) Annotation discursive : corpus de référence pour le français et outils d'aide à l'annotation et à l'exploitation

Project Website

PI: Marie-Paule Péry-Woodley

Partners: LILaC, CLLE-ERSS, Greyc


Scientific background and objectives:
The linguistic study of discourse is a dynamic and growing field, comprising research in descriptive linguistics, formal semantics and pragmatics, and computational linguistics or NLP. Nevertheless, it is far from being a mature branch of linguistics with well understood paradigms and alternatives running through . The structuring of discourse, a central theme, is for instance analysed by means of diverse notions: discourse relations, theme or topic, information structure, discourse framing, etc. Each hypothesis or theory, naturally enough, tends to concentrate on one aspect of a very complex phenomenon. Given this state of the art, our idea is to develop an empirical program of discourse annotation at different levels and for different phenomena on a diversified corpus of French texts, in order to study the issue of discourse structure and its effects on interpretation from several points of view in a collective fashion. We understand the task of discourse annotation in terms of two subtasks: the delimitation of segments of discourse and the determination of various sorts of hierarchical and semantic/pragmatic relations between these segments. This task cannot be undertaken at a large scale without the help of tools for discourse annotation that come from NLP. We intend to develop such tools in this project. The goal of this project is thus twofold: (i) build a corpus of discursively annotated texts; (ii) develop tools and interfaces to aid in discourse annotation, as well as automated tools for discourse analysis. The tools and the corpus will be made available to the linguistic and NLP communities. Our annotated corpus will help us test and generalize our hypotheses and existing approaches that have motivated our previous work in this area, as well as help others test their own ideas. We expect the corpus to make an important impact on the interaction of theoretical, empirical and descriptive strands of research in this area. We also expect this corpus and tools to benefit several practical areas of NLP: text summarizing, question answering systems, textual entailment and information extraction, which all ideally must take discourse structure into account.

Rhecitas: rhetorical citation analysis

(Projet TGE-Adonis 2008-2009)

PI: Ludovic Tanguy

Partners/Partenaires : In the context of open-access archives, a large part of humanities scientific production is now available to the general public on-line. Access is however limited because of a lack of standardization, indexing or flexable retrieval. Scientific publication can naturally be approached by exploiting the citation structures of papers, and some research engines (citeseer, google scholar) allow for the exploration of citation networks, in a simplified way. This project's main objective is to refine citation relations through a rhetorical analysis of their mentions.

Voiladis: lexical neighbours and discourse analysis

(APR biennal 2008, PRES Université de Toulouse) VOIsinage Lexical pour L'Analyse du DIScours.

PI: Cécile Fabre

Partners/Partenaires : Also, Clementine Adam's Ph.D. takes place within Voiladis.

Abstract: Discourse analysis is a growing concern in the context of natural language processing, mainly to improve exploration and access to existing textual data. Discourse structure can be considered basically as the division of a text in segments that are functionally related in a coherent ensemble. This cohesion results from various factors implicitely or explicitely expressed (co-references, markers, lexical continuity, etc). This project aims at the study of the interplay of lexical use and discourse structure, by bringing two perspectives together. From the global point of view, semantically close lexical items help determine large coherent segments, whereas at a more local level, typical lexical associations seem correlated to certain types of rhetorical relations between smaller segments. The project will investigate mainly the relevance and contribution of distributional similarity measures to both these aspects.

Itipy: itinerary extraction from travel stories

ITIPY est un projet de recherche co-financé par le conseil régional d'Aquitaine, l'INRIA Bordeaux – Sud-Ouest et le LIUPPA (2009-2012).
Partners/Partenaires : Temporary project site