SIG Team

Head : Olivier TESTE

The SIG team, meaning Generalized Information Systems (« Systèmes d’Informations Généralisés »), exists since 2003 (year of the foundation of IRIT lab). It is one of the largest team of the lab and is composed of 21 permanent positions (regular researchers) and around thirty non permanents such as post-doc, PhD students, internship students or research engineers. We are spread over the four universities of the Western Occitanie region: Toulouse 1 Capitole University, University of Toulouse – Jean Jaurès, Paul Sabatier University, Jean François Champollion University (via the ISIS school in Castres).

Our research area is « the Data », and particularly data management and mass data processing (« Big Data »). We propose methods, models, languages and tools for simple, effective, and efficient access to qualified and relevant information. The final goal is enhancing information usage, easing information analysis and supporting the decision-making process.

Our research works are applied on a great variety of datasets: scientific databases, business databases (aeronautics, space, energy, biology, health, etc.), the Web, ambient and mobiles applications (such as user generated content), open data, scientific benchmarks (CLEF, OAEI, SSB, TPC-H/DS, TREC, etc.), semantic and knowledge data (such as ontologies), sensors and connected objects (IoT) and more.

Our research is directed towards the whole data processing chain, from raw data to elaborated data accessible to users who search for information, who visualize for synthetic views and who perform decisional and predictive analyses.

 

PNG - 5.9 kb

Figure 1 : Data processing chain.
This research is organized in four axes.

Automatic Integration of heterogeneous data

Today available data compose datasets that are voluminous (mass data), that are most often heterogeneous including a large diversity of structures (structured, semi-structured and even non-structured data), and that are widely distributed. Our work addresses different aspects of data heterogeneity: entity heterogeneity, structural heterogeneity, syntactic and semantic heterogeneity of data pieces.

The issue is to propose methods and algorithms that are able to match automatically elements from multiple data or knowledge sources (holistic matching). The targeted matchings may be simple (1 to 1), multiple (1 to n or n to 1), and even complex (n to m).

Non-conventional databases management

Modern databases management systems are expected to manage huge amounts of data. This data always embed an important variety (classical relational data, structured documents – such as XML or JSON -, textual data, domain ontologies…). These systems cannot be based any more on a standard and uniform data model (i.e. relational). Conversely, they are structured over centralized (data warehouses, data lakes) and distributed storage systems that are based on non-conventional data model paradigms, such as key-value, document-oriented, column-oriented or graph-oriented data models, also called noSQL (not only SQL).

In this context, the issue is developing, on the one hand, novel design methods promoting explicitly formalized data representation models (concepts and formalisms). On the other hand, such data models require formalized languages that allow manipulating and processing data. Such languages should prove the completeness of a closed algebraic core of elementary operators in order to ensure the model coverage, and should guarantee the validity and the power of expressivity of the language.

User oriented data

Complex systems, that are able to be more efficient by adapting themselves, strongly require some knowledge about the user. This knowledge is often stored into a user profile that is a set of data characterizing the user (his contexthis habits and the way of using the system)).

In this context, the issue is to define contextual profiles (spatio-temporal, egocentric, etc.) of each user or user set (group). These profiles are then used in order to propose new models or algorithms in recommender systems or information filtering systems. Profile construction technics are also used in social networks analysis context (community, fraud, influence or sentiment detection).

Analysis, learning and mining in huge amounts of data

The big data era has completely renewed computer science. Today, Humanity produces huge amounts of data in the globalized Internet network, in the Internet of Things, and also in scientific observation and recording installations (satellites, particle accelerators, DNA sequencers, etc.). Novel algorithms are now built and can be executed over clusters of computers. They enable the analysis, the mining (predictions), and the simulations over data masses.

SIG team works on the parametrization of analysis and data mining algorithms as well as on machine learning and deep learning algorithms. “Data intelligence” is both an issue and a challenge of data science and greatly depends on the underlying algorithms and the models’ efficiency. “Data intelligence” approaches must guarantee the best possible reproducibility of results. This reproducibility is hard to achieve because very large and heterogeneous datasets are most often of poor quality (or even random quality) and are built on sparse and imbalanced data distributions. These characteristics require a precise tuning of the algorithms used and that makes each approach very specific to a reduced data subset.

skills

design and modeling of non-conventional databases: data warehouses, OLAP, data lakes, noSQL storage
automatic integration of heterogeneous data and knowledge (Data Matching, Ontology Alignment)
management of complex documents
social media analysis
contextualized processes of information retrieval
recommendation systems, chatbot
Data mining and Machine Learning, Deep Learning in multimodal data masses

personnel of the team SIG

Permanent members
Non-permanent members

publications of the team SIG

Articles dans des revues internationales
Articles dans des revues nationales
Rédaction de numéros spéciaux de revues
Conférences et workshops internationaux avec actes édités et comité de lecture
Conférences sans actes publiés
Livres (monographies)
Thèses et habilitations
  • Thi Bich Ngoc Hoang

    Diffusion d’information, extraction d’information et de connaissance dansles réseaux sociaux

    Thèse de doctorat, Université de Toulouse-le-Mirail, 2018.

    BibTeX

  • Amine El Haddadi

    Conception et développement d’un système d’intelligence économique (SIE) pour l’analyse de big data dans un environnement de cloud computing

    Thèse de doctorat, Université Paul Sabatier, 2018.

    BibTeX

  • Amal Ait Brahim

    Approche dirigée par les modèles pour l’implantation de bases de données massives sur des SGBD NoSQL

    Thèse de doctorat, Université de Toulouse, octobre 2018.

    BibTeX

  • Abdelhamid Chellal

    Event Summarization on Social Media Stream: Retrospective and Prospective Tweet Summarization

    Thèse de doctorat, Université Paul Sabatier, septembre 2018.

    Résumé Accès : https://www.irit.fr/publis/IRIS/2018_These-CHELLAL.pdf
    BibTeX

  • Mahdi Washha

    Information Quality in Online Social Media and Big Data Collection: An Example of Twitter Spam Detection

    Thèse de doctorat, Université Paul Sabatier, juillet 2018.

    BibTeX

  • William Raynaut

    Perspectives de Méta-Analyse pour un Environnement d’aide à la Simulation et Prédiction

    Thèse de doctorat, Université de Toulouse, janvier 2018.

    Accès : ftp://ftp.irit.fr/IRIT/SIG/2018_These_Raynaut.pdf
    BibTeX

  • Kiswendsida Kisito Kaboré

    Système d’aide pour l’accès non supervisé aux unités documentaires

    Thèse de doctorat, Université Paul Sabatier, 2017.

    BibTeX

  • Jeremy Bascans

    Modèles de mémoires d’entreprise avec intégration automatique d’informations

    Thèse de doctorat, Université Paul Sabatier, octobre 2017.

    Résumé
    BibTeX

  • Jiefu Song

    Business Intelligence Enhanced by the Web of Data

    Thèse de doctorat, Université de Toulouse, décembre 2017.

    Résumé Accès : https://www.irit.fr/publis/SIG/J.SONG_Manuscrit.pdf
    BibTeX

  • Anass El Haddadi

    Big Data Mining: de l’extraction des données à leur visualisation

    Habilitation à diriger des recherches, université Abdelmalek Essaadi,, Tétouan, Maroc, mars 2017.

    BibTeX

Rapports

contrats of the team SIG

AcronymeTitlePeriodScientific
leaders
Partners
FILTER 2Filtrage négatif des contenus de vidéo protection2016-2020Sedes, FlorenceCNRS (EPCST) – THC/Thales Communication (THC)(Société Anonyme) – Université de Poitiers(EPCSCP) – Préfecture de Police(Administration)
ARCSYS
[Contract completed]
Accés et recollection dans les systèmes d’information complexes2013-2015Chevalier, MaxCNRS (EPCST) – Université Paris VI(EPCSCP) – CNRS / Délégation Midi Pyrénées (EPCST)
INCOME
[Contract completed]
Infrastructure logicielle de gestion de contexte multi-échelle pour l’internet des objets2012-2015Arcangeli, Jean-Paul # Desprats, Thierry # Peninou, AndréInstitut Telecom(EPCSCP)
METHODEO
[Contract completed]
Méthodologie de tests et définition de métriques poour l’évaluation d’alogorithmes pour la vidéoprotection2011-2013Sedes, FlorenceTHC/Thales Communication (THC)(Société Anonyme) – CEA/LIST(Laboratoire) – Supélec(EPCSCP) – Thales Services SAS(GE – SAS) – TSP/Telecom Sud Paris(Ecole) – Keeno sas(SAS Société par Actions Simplifiées)
CAAS
[Contract completed]
Contextual Analysis and Adaptative Search2010-2014Mothe, JosianeLIA Avignon(Laboratoire) – Cognition, Langue, langage Ergonomie (CLLE)(Laboratoire)
AcronymeTitlePeriodScientific
leaders
Partners
PREVISIONPrediction and Visual Intelligence for Security Information2019-2021Mothe, JosianeFraunhofer-Gesellschaft zur Forderung der Angewandten Forschung E.V(Organisme étranger – Public) – SIVECO ROMANIA SA (Institution étrangère) – ICCS/Institute of communication and computer systems (Organisme étranger – Public) – UPV/Universitat Politecnica de Valencia(Organisme étranger – Public) – ETRA INVESTIGACION Y DESARROLLO SA(Organisme étranger – Public) – ITTI SP ZOO(Organisme étranger – Privé) – IFMPT INSTITUT FUR PROGNOSETECHNIK VERTRIEBS GMBH(Organisme étranger – Public) – BALTIJOS PAZANGIU TECHNOLOGIJU INSTITUTAS(Organisme étranger – Public) – ETHNIKO KENTRO EREVNAS KAI TECHNOLOGIKIS ANAPTYXIS(Organisme étranger – Public) – SPACE HELLAS ANONYMI ETAIREIA SYSTIMATA KAI YPIRESIES TILEPIKOINONIONPLIROFORIKIS ASFALEIAS – IDIOTIKI EPICHEIRISI PAROCHIS YPERISION ASFA(Organisme étranger – Public)
FabSpace 2.0
[Contract completed]
The Fablab for geodata-driven innovation – by leveraging Space data in particular, in Universities 2.02016-2019Mothe, JosianeUL/Université de Liège (UL)(EPCSCP) – Athena Research and Innovation Center in Information Communication & Knowledge Technologies(Institution étrangère) – TerraNIS(PME) – AV/Aesrospace Valley(Association) – UoRT/Université Degli Studi Di Roma Torvergata (Organisme étranger – Public) – ESA BIC LAZIO/BIC LAZIO SPA(Organisme étranger – Public) – WSL ESA BIC WR/WSL ESA BIC WR(Organisme étranger – Public) – Cesah ESABICDA/Cesah GmbH Centrum für Satellitennavigation Hessen(Organisme étranger – Public) – Ecole polytechnique de Varsovi/POLITECHNIKA WARZAWSKA(Organisme étranger – Public) – OPEGIEKA/OPEGIEKA SPOLKA Z OGRANICZONA ODPOWIEDZIALNOSCIA(Organisme étranger – Privé) – EBN/European Business and Innovation Centre Nerwork(Organisme étranger – Privé) – IDGEO/IDGEO(PME – SARL) – ICCS/Institute of communication and computer systems (Organisme étranger – Public)
SOMIR
[Contract completed]
Semantic Oriented Multimedia Information Retrieval2009-2011Sedes, Florence
LINDO
[Contract completed]
Large scale distributed INDexation of multimedia Objets2007-2010Sedes, FlorenceCEA/LIST(Laboratoire) – Supélec(EPCSCP) – Thales Security Systems(Grande Entreprise) – Space Applications Services(Grande Entreprise) – KUL/Katholieke Universiteit Leuven(Organisme étranger – Public) – Denodo(Institution étrangère) – T I+D/Telefonica Investigacion y Desarolla (Institution étrangère) – SGT(PME) – Hi-Store(Société Anonyme)
EDeAN
[Contract completed]
European Design for All for eInclusion2006-2009Vigouroux, NadineCNR ISTI/Consiglio Nazionale delle Ricerche(Institution étrangère)
CONTRAPUNCTUS
[Contract completed]
CONTRAPUNCTUS Braille Music Digital Sources2006-2009Jessel, NadineArca Progetti SRL(Grande Entreprise) – Unione Italiana Ciechi – Verona(Association) – Biblioteca italiana/Biblioteca italiana per i ciechi ‘Regina Margherita'(Institution étrangère) – Stiching FNB(Institution étrangère) – Organizacion Nacional De Ciegos Españoles, Cidat(Institution étrangère) – Royal National Institute of the Blind(Institution étrangère) – IPTK LOGOS VOS(Institution étrangère) – Union Europénne des Aveugles(Association) – Conservatorio musicale di Padova(Institution étrangère)
WS-Talk
[Contract completed]
Web services communicating in the language of their community2004-2006Mothe, JosianeLemonlabs GmbH(Institution étrangère)
LAMBDA
[Contract completed]
Linear Access to Mathematic for Braille Device and Audio-synthesis2002-2005Jessel, NadineUniversity of York(EPCSCP) – Università Statale di Milano (EPCSCP) – Ministero dell¿Istruzione (MIUR)(Ministère) – Universität Stuttgart(EPCSCP) – EBU European Blind Union(Association) – ACAPO/ACAPO – Associação dos Cegos e Ambliopes de Portugal(Association) – BIC-Italian Library of the Blind (PME) – Dodecanese Association in Rhodes(Association) – Moscow’s Logos Center (Association) – ONCE/ONCE – Organización Nacional de Ciegos Españoles (Administration) – RNIB/RNIB Royal National Institute of the Blind(Administration) – Unione Italiana Ciechi – Verona(Association) – Veia Progetti SRL(PME)
E-STAGE
[Contract completed]
A new stage for the cultural heritage in european Puppetry2001-2005Mothe, JosianeUPS/Université Toulouse III(EPCSCP) – Lemonlabs GmbH(Institution étrangère) – Theater Waidspeicher() – Deutsches Institut fuer Wirtschaftsforschung E.V.(Laboratoire)
Wednesday 2 October 2019, 14h00
Interrogation de données hétérogènes dans les bases de données orientées documents
Hamdi BEN HAMADOU – Team SIG, IRIT UT3 Paul Sabatier, IRIT, Auditorium J. Herbrand
#these
Friday 27 September 2019, 10h00
Partitioning And Local Matching Learning of Large Biomedical Ontologies
Amir LAADHAR – Team SIG, IRIT UT3 Paul Sabatier, IRIT, Auditorium J. Herbrand
#these
Tuesday 18 December 2018, 14h00
Modèles neuronaux pour la recherche d’information : approches dirigées par les ressources sémantiques
Gia Hung NGUYEN – Team IRIS – IRIT UT3 Paul Sabatier, IRIT, Auditorium J. Herbrand
#these
Wednesday 31 October 2018, 10h00
Approche dirigée par les modèles pour l’implantation de bases de données massives sur des SGBD NoSQL
Amal AIT BRAHIM – Team SIG – IRIT UT1 Capitole, Salle des Thèses
#these
Friday 28 September 2018, 10h00
Information Diffusion, Information and Knowledge Extraction From Social Networks
Thi Bich Ngoc HOANG – Team SIG – IRIT UT3 Paul Sabatier, IRIT, Auditorium J. Herbrand
#these
Monday 17 September 2018, 10h00
Synthèse d’événement dans les médias sociaux : résumé rétrospectif et prospectif de microblogs
Abdelhamid CHELLAL – Team IRIS – IRIT UT3 Paul Sabatier, IRIT, Auditorium J. Herbrand
#these
Tuesday 17 July 2018, 9h30
Information Quality in Online Social Media and Big Data Collection: An Example of Twitter Spam Detection
Mahdi WASHHA – Team SIG – IRIT UT3 Paul Sabatier, IRIT, Salle des Thèses
#these
Friday 12 January 2018, 10h30
Perspectives de Méta-Analyse pour un Environnement d’aide à la Simulation et Prédiction
William RAYNAUT – Team SIG – IRIT UT3 Paul Sabatier, IRIT, Auditorium J. Herbrand
#these
Tuesday 5 December 2017, 13h30
L’aide à la décision enrichie par le web des données
Jiefu SONG – Team SIG – IRIT UT1 Capitole, Manufacture des Tabacs, Salle des thèses
#these
Friday 6 October 2017, 10h00
Modèles de mémoires d’entreprise avec intégration Aatomatique d’Informations 
Jeremy BASCANS – Team SIG – IRIT UT3 Paul Sabatier, IRIT, Auditorium J. Herbrand
#these
Sunday 21 July 2019 – Thursday 25 July 2019
SIGIR 2019 : 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
Paris
#congres Know more
Tuesday 6 November 2018 – Friday 9 November 2018
SAGEO 2018 : Spatial Analysis and GEOmatics
Montpellier
#congres Know more
Thursday 21 June 2018 – Friday 22 June 2018
VSST 2018 : Séminaire international Veille Stratégique Scientifique et Technologique
UT3 Paul Sabatier, IRIT, Auditorium J. Herbrand
#congres Know more
Monday 16 October 2017
Colloque SIF : Casser les codes — Femmes, genre et informatique
Institut des sciences de la communication – CNRS/Paris-Sorbonne/UPMC, Paris
#congres Know more
Thursday 1 June 2017
Dans le cadre d’INFORSID 2017 : Atelier Systèmes d’information et de décision et Démocratie
Manufacture des Tabacs, salle MD001
#congres Know more
Wednesday 31 May 2017
Dans le cadre d’INFORSID 2017 : Atelier Valorisation et Analyse des DOnnées de la Recherche (VADOR)
Manufacture des Tabacs
#congres Know more
Tuesday 30 May 2017
Dans le cadre d’INFORSID 2017 : Atelier De la surveillance à la gestion de crise : prise en compte des alertes
Manufacture des Tabacs, salle MH001
#congres Know more
Tuesday 30 May 2017 – Friday 2 June 2017
35e édition d’INFormatique des ORganisations et Systèmes d’Information et de Décision (INFORSID 2017)
Manufacture des Tabacs
#congres Know more
Wednesday 9 March 2016 – Friday 11 March 2016
Semaine du Document Numérique et de la Recherche d’Information
Ecole Supérieure du Professorat et de l’Education (ESPE), Toulouse
#congres Know more
Friday 13 November 2015
Journée Femmes et Sciences : Choisir et vivre une carrière scientifique ou technique au féminin : pourquoi, comment ?
Muséum d’Histoire Naturelle de Toulouse, Toulouse
#congres Know more
Friday 5 July 2019, 15h20 – 16h00
A theory of information perspective on hyperspectral images
Mihai IVANOVICI – Transilvania University of Brasov (Roumanie) UT3 Paul Sabatier, IRIT, Salle des Thèses
#seminaire
Thursday 7 February 2019, 11h00 – 12h00
Usage patterns of non-native language speakers discovered by string kernels for native language identification
Radu TUDOR IONESCU – University of Bucharest (Roumanie) UT3 Paul Sabatier, IRIT, Salle du Conseil
#seminaire
Wednesday 6 February 2019, 14h00 – 15h30
Machine learning for anomaly detection in Video
Radu TUDOR IONESCU – University of Bucharest (Roumanie) UT3 Paul Sabatier, IRIT, Auditorium J. Herbrand
#seminaire
Monday 19 November 2018, 14h00 – 15h00
Information Theory
Mariam HARUTYUNYAN – Institute for Informatics and Automation Problems of National Academy of Sciences of Armenia (IIAP NAS RA) (Arménie) UT3 Paul Sabatier, IRIT, Auditorium J. Herbrand
#seminaire
Wednesday 18 July 2018, 14h00 – 15h00
The Evolution of Belief Rule Based Expert Systems: A New Paradigm of Computing
Mohammad SHAHADAT HOSSAIN – Dpt of Computer Science and Engineering, University of Chittagong (Bengladesh) UT3 Paul Sabatier, IRIT, Salle des Thèses
#seminaire
Friday 6 July 2018, 11h00 – 12h00
Non-linear approaches based on the maximum distance — a pseudo morphology and PCA approximation for color, multispectral and hyperspectral data/image analysis
Mihai IVANOVICI – Transilvania University of Brasov (Roumanie) UT3 Paul Sabatier, IRIT, Auditorium J. Herbrand
#seminaire
Friday 9 February 2018, 11h00 – 12h00
Thematic Semeste on Mathematic and Computer Science in Biology: Computational biophotonics and surgical data science for next-generation cancer treatment
Prof. Dr. Lena MAIER-HEIN – Div. Computer Assisted Medical Interventions (CAMI) German Cancer Research Center (DKFZ) (Allemagne) UT3 Paul Sabatier, IRIT, Auditorium J. Herbrand
#seminaire
Friday 9 February 2018, 8h45 – 9h45
Thematic Semester on Mathematic and Computer Science in Biology: Opportunities and challenges of using clinical data, including that from the electronic health record
William R. HERSH – Oregon Health & Science University (OHSU) (Etats-Unis) UT3 Paul Sabatier, IRIT, Auditorium J. Herbrand
#seminaire
Tuesday 16 May 2017, 12h30 – 14h00
Mon téléphone n’est plus une brique quand je veux écrire
Philippe ROUSSILLE – Team SIG – IRIT (France) UT1 Capitole, Manufacture des Tabacs, Salle ME302
#seminaire
Monday 7 November 2016, 14h00 – 16h00
A Test Collection for Research on Depression and Language Use
Fabio CRESTANI – Faculty of Informatics, University of Lugano (Suisse) UT3 Paul Sabatier, IRIT, Auditorium J. Herbrand
#seminaire
Tuesday 10 September 2019
Journée Ingénierie des Exigences du GDR GPL – JET 2019 – JET 2019
UT3 Paul Sabatier, IRIT, Auditorium J. Herbrand
#journee
Monday 1 October 2018
JET 2018 : Journée ingénierie des Exigences à Toulouse
UT3 Paul Sabatier, IRIT, Auditorium J. Herbrand
#journee
Tuesday 30 May 2017
Dans le cadre d’INFORSID 2017 : Atelier Open et/ou Linked Data dans les systèmes d’information
UT1 Capitole, Manufacture des tabacs – Hall Bât E – salle MH003
#journee
Monday 16 November 2015 – Tuesday 17 November 2015
Journées Big Data des GDR MADICS et MascotNUM – Trimestre thématique du LabEx CIMI
UT3 Paul Sabatier, IMT, Amphithéâtre Schwartz – Bât U4, Amphithéâtre Concorde
#journee