The SIG team, acronym for Systèmes d’Information Généralisés -Generalised Information Systems-, is one of the most important teams in the IRIT -Institut de Recherche en Informatique de Toulouse- laboratory with 20 teacher-researchers, based in 4 universities of the Occitanie region:

Université Toulouse 1 Capitole, Université Toulouse 2 Jean Jaurès, Université Toulouse 3 Paul Sabatier, and Université Jean François Champollion (Ecole ISIS, Castres).

The team also hosts about 40 post-doctorate, PhD and internship students, research engineers, and associate researchers.


The research the SIG team carries out is centered on the “data”, which is a core component of modern information systems. Data is often massive (“Big Data”), produced in large quantities by humans or systems such as satellite systems, social networks, medical imagery, sensors and video surveillance systems.

The SIG team’s research activities aim to design and develop methods, models, languages, algorithms and software tools that allow simple and efficient access to relevant information, to improve its use, facilitate its analysis and assist humans in decision-making.

Our research covers the entire data processing chain, from raw data to elaborated data accessible to users who seek for information, wish to visualise it, or perform decisional, exploratory and predictive analyses.


In its research work, the SIG team uses a wide variety of raw data, from corporate (aeronautics, space, energy, biology, health…) to scientific databases, document collections, the Web and mobile applications (“User Generated Content”), open data (“Open Data”), scientific benchmarks (CLEF, TREC, OAEI, TPC-H/DS…), knowledge bases or semantic data (ontologies), sensors and connected objects (IoT). 

These raw data are generally transformed into an elaborate form such as relational or multi-dimensional tables, matrix combinations, inverted files or indexes, uni-varied or multi-varied time series, graphs or hypergraphs. 

The Value of the data is then exploited by data analysis and data mining algorithms, machine learning and deep learning to bring out and discover the knowledge it hiddes. 

Our theoretical and applied work leads to assets that are part of the different application areas and strategic actions the IRIT laboratory defined as priorities:

The research developed by the SIG team also covers various research areas as defined by the ACM Computing Classification System and deals in particular with three main scientific areas.

Database design and models

  • Non-conventional databases (column, document, graph,key-value)
  • Multi-model databases (data-store, poly-stores, multi-stores)
  • Schema inference, Data-driven modeling

Query languages

  • Query algebra
  • Polyglot querying
  • Schemaless querying

Information integration

  • Data matching
  • Data cleaning, Deduplication, Entity resolution
  • Extract-Transform-Load (ETL)

Information systems applications

  • Datalake
    • Metadata management
  • Data warehouses
    • Multidimensional models
    • OLAP algebra
  • Democratic and Enterprise information systems
    • Enterprise knowledge graph modeling
    • Self-learning design

Document representation

  • Key-phrase extraction
  • Transformers (BERT) document content representations

Information retrieval query processing

  • Query adaptation, Query expansion
  • Query model adaptation, Multicore model

Recommender systems

  • Uplift models

Retrieval tasks and Evaluations

  • New measures
  • Resources for specific languages (Malgache, Amharique)
  • Semantic search
  • Statistical test usage
  • Text simplification

Anomaly detection

  • Outlier detection in multi-variate time-series
  • Pattern-based anomaly detection
  • Physic-constrained deep learning
  • Time-series forecasting


  • Fair clustering
  • High-dimensional clustering
  • Multi-view clustering

Explainable AI

  • Explanations of learning methods
  • Meta-learning, Automated machine learning

Federated Learning

  • Federated clustering
  • Time-series forecasting

Text Mining

  • Graph-based analysis of documents
  • Novelty detection in semi-structured data