Home - Equipe SIG

The SIG team, acronym for Systèmes d’Information Généralisés -Generalised Information Systems-, is one of the most important teams in the IRIT -Institut de Recherche en Informatique de Toulouse- laboratory with 20 teacher-researchers, based in 4 universities of the Occitanie region:

Université Toulouse 1 Capitole, Université Toulouse 2 Jean Jaurès, Université Toulouse 3 Paul Sabatier, and Université Jean François Champollion (Ecole ISIS, Castres).

The team also hosts about 40 post-doctorate, PhD and internship students, research engineers, and associate researchers.

OUR RESEARCH

The research the SIG team carries out is centered on the “data”, which is a core component of modern information systems. Data is often massive (“Big Data”), produced in large quantities by humans or systems such as satellite systems, social networks, medical imagery, sensors and video surveillance systems.

The SIG team’s research activities aim to design and develop methods, models, languages, algorithms and software tools that allow simple and efficient access to relevant information, to improve its use, facilitate its analysis and assist humans in decision-making.

Our research covers the entire data processing chain, from raw data to elaborated data accessible to users who seek for information, wish to visualise it, or perform decisional, exploratory and predictive analyses.

OUR WORK

In its research work, the SIG team uses a wide variety of raw data, from corporate (aeronautics, space, energy, biology, health…) to scientific databases, document collections, the Web and mobile applications (“User Generated Content”), open data (“Open Data”), scientific benchmarks (CLEF, TREC, OAEI, TPC-H/DS…), knowledge bases or semantic data (ontologies), sensors and connected objects (IoT).

These raw data are generally transformed into an elaborate form such as relational or multi-dimensional tables, matrix combinations, inverted files or indexes, uni-varied or multi-varied time series, graphs or hypergraphs.

The Value of the data is then exploited by data analysis and data mining algorithms, machine learning and deep learning to bring out and discover the knowledge it hiddes.

Our theoretical and applied work leads to assets that are part of the different application areas and strategic actions the IRIT laboratory defined as priorities:

Aeronautics, Space, Transportation
Computing, Mass Data, AI
Heritage and people safety
Social medias, digital social ecosystems
e-Education for learning and teaching
Health, Autonomy, Living, Well-being
Smart city

The research developed by the SIG team also covers various research areas as defined by the ACM Computing Classification System and deals in particular with three main scientific areas.

Data Management

Information Retrieval

Machine learning for applications

Data Management

Information Retrieval

Machine learning for applications

Data management

Database design and models

Non-conventional databases (column, document, graph,key-value)
Multi-model databases (data-store, poly-stores, multi-stores)
Schema inference, Data-driven modeling

Query languages

Query algebra
Polyglot querying
Schemaless querying

Information integration

Data matching
Data cleaning, Deduplication, Entity resolution
Extract-Transform-Load (ETL)

Information systems applications

Datalake
- Metadata management
Data warehouses
- Multidimensional models
- OLAP algebra
Democratic and Enterprise information systems
- Enterprise knowledge graph modeling
- Self-learning design

Information Retrieval

Document representation

Key-phrase extraction
Transformers (BERT) document content representations

Information retrieval query processing

Query adaptation, Query expansion
Query model adaptation, Multicore model

Recommender systems

Uplift models

Retrieval tasks and Evaluations

New measures
Resources for specific languages (Malgache, Amharique)
Semantic search
Statistical test usage
Text simplification

Machine learning for applications

Anomaly detection

Outlier detection in multi-variate time-series
Pattern-based anomaly detection
Physic-constrained deep learning
Time-series forecasting

Clustering

Fair clustering
High-dimensional clustering
Multi-view clustering

Explainable AI

Explanations of learning methods
Meta-learning, Automated machine learning

Federated Learning

Federated clustering
Time-series forecasting

Text Mining

Graph-based analysis of documents
Novelty detection in semi-structured data