Research Projects

Recent and active funded projects



SLANT: Spin and Bias in Language Analyzed in News and Text

ANR PRCI 2020-2022

Project Website

PI: Philippe Muller

There is a growing concern about misinformation or biased information in public communication, whether in traditional media or social forums. While automating fact-checking has received a lot of attention, the problem of fair information is much larger and includes more insidious forms like biased presentation of events and discussion. The SLANT project aims at characterising bias in textual data, either intended, in public reporting, or unintended in writing aiming at neutrality. An abstract model of biased interpretation using work on discourse structure, semantics and interpretation will be complemented and concretised by finding relevant lexical, syntactic, stylistic or rhetorical differences through an automated but explainable comparison of texts with different biases on the same subject, based on a dataset of news media coverage from a diverse set of sources. We will also explore how our results can help alter bias in texts or remove it from automated representations of texts.

Partners: IRIT(MELODI), INRIA-Lille(Magnet), University of Luxembourg

DESCARTES: A CNRS@CREATE Program on Intelligent Modelling for Decision-making in Critical Urban Systems

A France-Singapore collaboration project Project Website

Program DesCartes aims to develop disruptive hybrid AI to serve the smart city and to enable optimized decision-making in complex situations, encountered for critical urban systems. Hybrid AI leaps forward, beyond the current black-box procedures used in fully data-driven AI, by integrating meanings and semantics in the following way: First, it embraces “analog” and “digital” worlds, combining physics knowledge with AI-based data-driven models, giving rise to the novel Hybrid Twins concept within physics-aware AI, that will be the main building block of intelligent modelling. Second, it combines reasoning, using high level concepts, with the more traditional machine-learning approach. This “hybrid” paradigm enables reducing the need for big data and other resources, in particular to produce intelligent modelling that empowers decision-making technologies, as well as making trustworthy AI easier to achieve, and more generally leads to a responsible Hybrid AI. Last but not least, a key aspect of DesCartes is to bridge Hybrid AI with Humans as observers, players, decision-makers, to realize the people-centric vision of smart cities. Depending on the application, hybrid AI could be the driver for decisions, and humans could be supervisors; or humans could be the drivers and hybrid AI could support them by assisting the decision process.

Quantum : Question Generation for Textual Understanding via Machine Reading

ANR PRCE 2020-2022

Project Website

Today's overabundance of textual data makes it difficult to find the correct answer to a user query. In this context, Question Generation (the ability to automatically generate questions from a document) is rapidly gaining traction as a key technology. This project aims to investigate the task of passage-level abstractive question generation: we will focus on how to generate questions whose answer is distributed in the text and where the words that make up the question are not necessarily present in that text. We will develop machine reading approaches that take into account the structure of the document (both typographically and rhetorically), and generate complex questions that inherently rely on this structure. In order to train our models, we will construct relevant datasets and annotations with limited supervision. Our models will be evaluated both intrinsically and by integrating them into a conversational agent as an in-vivo testbed.

PI: Lucile Callebert


Datcha: Knowledge extraction from large corpora of human-human conversation data from web chat services

Datcha is an ANR project (2016-2019)

Project Website

PI: Frédéric Béchet

The goal of the DATCHA project is to perform knowledge extraction from very large databases of WEB chat conversations between operators and clients in customer contact centers. Extracting knowledge from chat corpus is a challenging research issue. Simply applying traditional text mining tools is clearly sub-optimal as it takes into account neither the interaction dimension nor the particular nature of this language which shares properties of both spoken and written language. The DATCHA project will address scientific issues including intra-conversation analysis through a deep semantic analysis (syntactic, semantic, discursive and structural analysis) and inter-conversation analysis (definition of semantic and discursive similarity between conversations). It will propose innovative solutions in various use-cases including analytics report generation, conversation success prediction on the basis of criteria defined by operational units, and online conversation solving



Industrial: Orange Labs


TextLink: Structuring Discourse in Multilingual Europe

Textlink is a COST action (IS1312)

Project Website

Chair: Liesbeth Degand

The TextLink Action will facilitate European multilingualism by (1) identifying and creating a portal into such resources within Europe “including annotation tools, search tools, and discourse-annotated corpora; (2) delineating the dimensions and properties of discourse annotation across corpora; (3) organising these properties into a sharable taxonomy; (4) encouraging the use of this taxonomy in subsequent discourse annotation and in cross-lingual search and studies of devices that relate and structure discourse; and (5) promoting use of the portal, its resources and sharable taxonomy.


Asfalda is an ANR project (2012-2015)

Project Website

PI: Marie-Hélène Candito

The ASFALDA project aims to provide both a French corpus with semantic annotations and automatic tools for shallow semantic analysis, using machine learning techniques to train analyzers on this corpus. The target semantic annotations can be characterized roughly as an explicitation of “who does what when and where”, that abstracts away from word order / syntactic variation, and to some of the lexical variation found in natural language.


Academic: Alpage, IRIT(MELODI), LIF, LLF

Industrial: Ant'inno, CEA LIST


Polymnie is an ANR project (2012-2015)

Project website

PI: Sylvain Pogodalla (Semagramme)

Polymnie focuses on studying and implementing the modeling of sentences and discourses in a compositional paradigm that takes into account their dynamics and their structures, both in parsing and in generation. To that end, we rely on the ACG framework. The kind of processing we are interested in relate to the automatic construction of summaries or to text simplification. This has to be considered in the limits of the modelling of the linguistic processes (as opposed to inferential processes for instance) these tasks involve.


Semagramme, Alpage, Melodi, Signes

STAC: Strategic conversation

STAC is an ERC advanced grant (2011-2015)

Project Website

PI: Nicholas Asher

Partners: IRIT, Edinburgh University, Heriot Watt University, INRIA

STAC is a five year interdisciplinary project that aims to develop a new, formal and robust model of conversation, drawing from ideas in linguistics, philosophy, computer science and economics.

ANNODIS (Discourse Annotation: tools and reference corpus for French)

(Projet ANR 2007-2009) Annotation discursive : corpus de référence pour le français et outils d'aide à l'annotation et à l'exploitation

Project Website

PI: Marie-Paule Péry-Woodley

Partners: MELODI, CLLE-ERSS, Greyc

Scientific background and objectives:
The linguistic study of discourse is a dynamic and growing field, comprising research in descriptive linguistics, formal semantics and pragmatics, and computational linguistics or NLP. Nevertheless, it is far from being a mature branch of linguistics with well understood paradigms and alternatives running through . The structuring of discourse, a central theme, is for instance analysed by means of diverse notions: discourse relations, theme or topic, information structure, discourse framing, etc. Each hypothesis or theory, naturally enough, tends to concentrate on one aspect of a very complex phenomenon. Given this state of the art, our idea is to develop an empirical program of discourse annotation at different levels and for different phenomena on a diversified corpus of French texts, in order to study the issue of discourse structure and its effects on interpretation from several points of view in a collective fashion. We understand the task of discourse annotation in terms of two subtasks: the delimitation of segments of discourse and the determination of various sorts of hierarchical and semantic/pragmatic relations between these segments. This task cannot be undertaken at a large scale without the help of tools for discourse annotation that come from NLP. We intend to develop such tools in this project. The goal of this project is thus twofold: (i) build a corpus of discursively annotated texts; (ii) develop tools and interfaces to aid in discourse annotation, as well as automated tools for discourse analysis. The tools and the corpus will be made available to the linguistic and NLP communities. Our annotated corpus will help us test and generalize our hypotheses and existing approaches that have motivated our previous work in this area, as well as help others test their own ideas. We expect the corpus to make an important impact on the interaction of theoretical, empirical and descriptive strands of research in this area. We also expect this corpus and tools to benefit several practical areas of NLP: text summarizing, question answering systems, textual entailment and information extraction, which all ideally must take discourse structure into account.

Rhecitas: rhetorical citation analysis

(Projet TGE-Adonis 2008-2009)

PI: Ludovic Tanguy

Partners/Partenaires : In the context of open-access archives, a large part of humanities scientific production is now available to the general public on-line. Access is however limited because of a lack of standardization, indexing or flexable retrieval. Scientific publication can naturally be approached by exploiting the citation structures of papers, and some research engines (citeseer, google scholar) allow for the exploration of citation networks, in a simplified way. This project's main objective is to refine citation relations through a rhetorical analysis of their mentions.

Voiladis: lexical neighbours and discourse analysis

(APR biennal 2008, PRES Université de Toulouse) VOIsinage Lexical pour L'Analyse du DIScours.

PI: Cécile Fabre

Partners/Partenaires : Also, Clementine Adam's Ph.D. takes place within Voiladis.

Abstract: Discourse analysis is a growing concern in the context of natural language processing, mainly to improve exploration and access to existing textual data. Discourse structure can be considered basically as the division of a text in segments that are functionally related in a coherent ensemble. This cohesion results from various factors implicitely or explicitely expressed (co-references, markers, lexical continuity, etc). This project aims at the study of the interplay of lexical use and discourse structure, by bringing two perspectives together. From the global point of view, semantically close lexical items help determine large coherent segments, whereas at a more local level, typical lexical associations seem correlated to certain types of rhetorical relations between smaller segments. The project will investigate mainly the relevance and contribution of distributional similarity measures to both these aspects.

Itipy: itinerary extraction from travel stories

ITIPY est un projet de recherche co-financé par le conseil régional d'Aquitaine, l'INRIA Bordeaux – Sud-Ouest et le LIUPPA (2009-2012).
Partners/Partenaires : Temporary project site