Josiane Mothe

Professor - Information Systems, Big Data, Information Retrieval, Information Exploring and Machine Learning

Phone number: +33(0)5 61 55 64 44
Email: mothe [at] irit [dot] fr
Institut de Recherche en Informatique de Toulouse (IRIT) / SIG team
118, Route de Narbonne
31062 Toulouse Cedex 04

Proposals for theses, internships and post-docs

Post-doc: information retrieval, data mining and machine learning.

Thesis: deep learning, machine learning, applied to medical and agricultural fields, security, social network search.

Internships (bac+2, L, M): Internships in connection or not with companies in the fields of information retrieval, data mining, machine learning, neural networks, artificial intelligence, applied machine learning.

Interested in these topics?
=> Please send your CV by email.

The general theme is centred on information retrieval from textual, semi-structured or unstructured information and images. The models, methods and developments aim at contextual access to raw or elaborated information, relevant for the users. The underlying problems concern the representation of information, particularly with a semantic aim, the management of the variety of information and its dynamics, coupled with volume, the definition of the characteristics of contexts and their recognition, the adaptation of search processes to the context, the elaboration of information by aggregation using multi-dimensional exploratory analysis methods as well as machine learning methods, including deep learning methods.

This work implements IR systems based on theoretical models. We have proposed context-aware adaptive models based on data mining and machine learning approaches.

The whole work is validated through an important experimental evaluation process in the framework of international benchmark evaluation campaigns that provide reference collections of big data, either in IR tasks (TREC-Text REtrieval Conference, CLEF-Cross Language Evaluation Forum, INEX-INitiative for the Evaluation of XML retrieval) or in information recommendation tasks (Challenge RecSys, Yandex challenge). In parallel, we validate our models in frameworks submitted by industrialists or public organisations on original problems, in particular in the context of European projects.

Thematic development

My scientific activity can be divided into five periods:

2017 to present:

I continue to collaborate in linguistics and mathematics, and internationally in information retrieval [8]. In the H2020 PREVISION project for example, we have developed methods to automatically detect radicalisation in texts. I have started to work on image mining [5], the European InnEO "Space_PhD" (Earth observation), UNIVERSEH (space observation) and AI4AGRI (agriculture development) projects also aim to solve this challenge. The Région Observatoire de la Terre et Territoires en transition challenge that I co-lead has a regional and multidisciplinary focus. The collaborations in the medical field for the analysis of images of cancer and head trauma go in the same direction. The scientific challenges I aim to solve are on the one hand the combination of text and image, on the other hand the links between spatial images and medical images in their processing and the specific challenges to be solved.

2010-2016:

Without stopping collaborations with linguists, I have started collaborations with researchers in mathematics. My researches focus on data analysis for information access, which I integrate into recommendation systems and information retrieval systems. The data I manipulate are varied: connection logs, web data, data from social networks and in particular from twitter, a framework in which I co-lead a task in the international CLEF evaluation campaign. My researches are also interested in learning methods, such as parsimonious SVMs (publication [7]). Naturally, I am interested in the specific problems of Big Data in information systems.

2001-2009:

After the habilitation to direct researches, I started collaborating with researchers in linguistics and integrated in my work the representation of terms and knowledge in the form of ontologies. I obtained several grants in this context. I participated in 3 European projects thanks to these acquired skills for which I was the IRIT leader (IRAIA, e-Stage and WS-Talk). In addition, with a linguist from my university, we proposed criteria for predicting the difficulty of queries (publication [6] cited 137 times on Google Scholar).

1995-2000:

After my thesis, I worked on structured documents (SGML then XML) and proposed new representations of texts, in particular documents from the Web, allowing to structure them and to apply OLAP visualisation methods initially intended for structured data from relational databases. I developed the DocCube prototype at that time (publication [4] cited 104 times in Google Scholar).

1991-1994:

During my DEA and my thesis, I was interested in the modelling of information retrieval using the neural network approach. This modelling, which comes from artificial intelligence, has shown its efficiency on small collections of unstructured texts. This type of modelling is now coming back in force with the new deep learning techniques.

Adapting information retrieval models to contexts

The problems concern taking into account the diversity of users and information needs and the meaning of the information conveyed by its textual content. Our models integrate these different aspects.

The consideration of the diversity of users' information needs and queries is based on the adaptation of the models to the contexts related in particular to the users' interests, preferences, environment or social networks. We have proposed machine learning methods to adapt the models to the encountered contexts. The meaning conveyed by the contents is taken into account by relying on existing or elicited resources (thesauri, ontologies, metadata, past queries, etc.).

The partnerships established over the last few years with linguistic colleagues at the national (CLLE Laboratory in Toulouse) or international (University of Perm, Russia and University of Bucharest, Romania) level have led to funded projects, joint publications and/or thesis co-supervision.

Some key results in the last 5 years:

Structured and unstructured data mining

My research in this field aims at designing methods and tools for the extraction and visualisation of elaborated information from varied, dynamic and voluminous data.

The starting point of the exploration are massive collections, homogeneous or heterogeneous, structured or not. This can be a set of documents from a domain to map it or other types of data such as the performance of different systems responding to different user requests to extract system or request typologies.

The analysis involves representing the information and then analysing the data to extract abstractions, global views, allowing the structure of the analysed information to be understood, the inter-relationships, the key elements and their correlations. We implement information exploration and visualisation scenarios for information with inter-relationships.

This work benefits from the contribution of my partnership with mathematics: 3 theses defended and co-supervised and more than 20 co-authored publications as well as links with industrialists (3 theses in progress with Airbus Industrie) and international researchers (Italy, Canada, Germany, Armenia, Madagascar). The data searched ranges from sensor data (cooperation with Airbus Defense and Space, 3 co-supervised theses), social networks (collaboration with the University of Armenia, psychiatrists, H2020 project PREVISION) for the detection of weak signals, communities, atypical behaviours).

Some key results in the last 5 years:

Towards applied multimedia mining

More recently and thanks to the European H2020 FabSpace 2.0 and Ineeo "Space Phd" projects that I supervised, and then the Horizon Europe AI4AGRI project, I became interested in image analysis.

In this context, I proposed a task in the international CLEF evaluation campaign whose objective was to automatically estimate the population of a geographical region from images. This type of task requires to rely on learning methods that I master thanks to my past researches (publication [11] cited more than 50 times on Google Scholar).

I have also initiated some work in the field of health and Earth observation:

Publications