ANNODIS (Discourse Annotation: tools and reference corpus for French)
(Projet ANR 2007-2009) Annotation discursive : corpus de référence pour le français et outils d'aide à l'annotation et à l'exploitationPI: Marie-Paule Péry-Woodley
Partners: LILaC, CLLE-ERSS, Greyc
Scientific background and objectives:
The linguistic study of discourse is a dynamic and growing field, comprising research in descriptive linguistics, formal semantics and pragmatics, and computational linguistics or NLP. Nevertheless, it is far from being a mature branch of linguistics with well understood paradigms and alternatives running through . The structuring of discourse, a central theme, is for instance analysed by means of diverse notions: discourse relations, theme or topic, information structure, discourse framing, etc. Each hypothesis or theory, naturally enough, tends to concentrate on one aspect of a very complex phenomenon. Given this state of the art, our idea is to develop an empirical program of discourse annotation at different levels and for different phenomena on a diversified corpus of French texts, in order to study the issue of discourse structure and its effects on interpretation from several points of view in a collective fashion. We understand the task of discourse annotation in terms of two subtasks: the delimitation of segments of discourse and the determination of various sorts of hierarchical and semantic/pragmatic relations between these segments. This task cannot be undertaken at a large scale without the help of tools for discourse annotation that come from NLP. We intend to develop such tools in this project. The goal of this project is thus twofold: (i) build a corpus of discursively annotated texts; (ii) develop tools and interfaces to aid in discourse annotation, as well as automated tools for discourse analysis. The tools and the corpus will be made available to the linguistic and NLP communities. Our annotated corpus will help us test and generalize our hypotheses and existing approaches that have motivated our previous work in this area, as well as help others test their own ideas. We expect the corpus to make an important impact on the interaction of theoretical, empirical and descriptive strands of research in this area. We also expect this corpus and tools to benefit several practical areas of NLP: text summarizing, question answering systems, textual entailment and information extraction, which all ideally must take discourse structure into account.
Rhecitas: rhetorical citation analysis
(Projet TGE-Adonis 2008-2009)PI: Ludovic Tanguy
Partners/Partenaires :- CLLE-ERSS (UMR 5263-Université de Toulouse)
- INIST (CNRS, Nancy)
- IRIT (UMR 5505-Université de Toulouse) SIG and LILAC groups
- SYNAPSE Développement
Voiladis: lexical neighbours and discourse analysis
(APR biennal 2008, PRES Université de Toulouse) VOIsinage Lexical pour L'Analyse du DIScours.PI: Cécile Fabre
Partners/Partenaires :- CLLE-ERSS (UMR 5263-Université de Toulouse): Cécile Fabre, Marie-Paule Péry-Woodley,Josette Rebeyrolles, Myriam Bras
- IRIT (UMR 5505-Université de Toulouse) LILAC: Philippe Muller, Nicholas Asher
Abstract: Discourse analysis is a growing concern in the context of natural language processing, mainly to improve exploration and access to existing textual data. Discourse structure can be considered basically as the division of a text in segments that are functionally related in a coherent ensemble. This cohesion results from various factors implicitely or explicitely expressed (co-references, markers, lexical continuity, etc). This project aims at the study of the interplay of lexical use and discourse structure, by bringing two perspectives together. From the global point of view, semantically close lexical items help determine large coherent segments, whereas at a more local level, typical lexical associations seem correlated to certain types of rhetorical relations between smaller segments. The project will investigate mainly the relevance and contribution of distributional similarity measures to both these aspects.
Itipy: itinerary extraction from travel stories
ITIPY est un projet de recherche co-financé par le conseil régional d'Aquitaine, l'INRIA Bordeaux – Sud-Ouest et le LIUPPA (2009-2012).Partners/Partenaires :
- Labri-Signes, INRIA, Univ. Bordeaux
- LIUPPA, Univ. Pau
- IRIT (UMR 5505-Université de Toulouse) LILAC: Philippe Muller, Nicholas Asher