The priority strategic action about “Scientific Computing, big Data and AI” is an evolution of the strategic axis about “Big Data and Scientific Computing” that worked from 2007 to 2017.
Until 2017, the strategic axis gathered IRIT researchers and projects dealing with big data analysis, algorithms, architectures and frameworks to manage, process, analyze and produce knowledge and value from complex, heterogeneous, large datasets or dataflows. The availability of huge data sets increased the success of machine learning algorithms, and the advances of deep learning in statistic AI, so that many research issues are shared by the data and AI communities. In order to increase synergies between IRIT research groups in machine learning, data science and other strong topics in AI, it was agreed with he IRIT AI department to extend the frontiers of the strategic action to AI, Big Data and scientific computing.
The missions of the strategic action are to ease information exchange regarding call for projects, events and any initiative about data and AI; to foster collaborations involving various competencies within IRIT, as well as cross-disciplinary collaborations with other labs in Toulouse that require data intensive use or generation; to promote the use and evolution of the OSIRIM data storage and computing platform, and to increase the visibility of research work and results from IRIT groups, targeting other scientific communities and industry at a local level, and focussing on SIGs like MADICS GDR on Big Data or the Mastodons program at a national level.
To reach these goals, the strategic action used the classical tools of a SIG (monthly meetings, mailing list). We also drove IRIT researcher to take part in seminars and industrial fora, we coordinated the joint publication of a special issue about Data Science of Noir sur Blanc, the IRIT journal; we organized several scientific workshops and seminars with invited keynote speakers. Another role of the strategic action was to manage the selection of an annual Master2 internship grant and of one PhD grant every two years, promoting inter-group collaborations on research dealing with “real” big data.
- Primary data collection and metadata generation: numeric, text, video, sound, satellite images, sensor data, social data, news, etc.datasets storage, filtering, quality assessment, data indexing; semantic representations of metadata, data enrichment with open data.
- Data storage, data repositories: data filtering and preliminary processes, big and heterogeneous data storage platforms; security and access issues (evaluation of no-SQL data repositories);
- Data aggregation, integration, modeling and management: virtual models, distributed architectures and networks for big data (grids, processing clusters); complex, real time and heterogeneous data repair and integration (virtual modeling, ontology-based semantic data integration); uncertainty management ;
- Data exploitation and processing: algorithms for data indexing, search, querying and mining recherche; machine learning algorithms: statistical and deep learning of data structures and représentations, optimization, dimension reduction and matrix computing for processing and analyzing images, natural language in text, genomic data, videos, sounds (music, oral natural language, noise); models for information retrieval, for high performance scientific computing; knowledge representation and engineering, ontologies, non monotonous reasoning, managing uncertain and fuzzy knowledge, models of argumentation.
- Infrastructures and management for data management: grids and distributed architectures for data storage and scientific computing, cost and energy optimization, middleware for data and service management, virtual environments.
- data querying and visualization: searching social media and data flows, distributed querying over distributed datasets and consistency management of the answers; natural language queryies; providing rapid access to large heterogeneous datasets; distributed inference and semantic search on large datasets and IoT data, industrial data visualization combining several modalities and levels of detail.
- Nov 2014: CIMI scientific seminar “Big data, questions de recherche en Midi-Pyrénées” (joint IRIT-IMT organization). 10 speakers from 10 labs in Toulouse (computer science, math, biology, aéronautics and space) and a keynote speaker: Mokrane BOUZEGHOUB (MASTODONS and GDR MADICS).
- 2015 : presentation of “masses de données et calcul à l’IRIT” at Innovation IT Days (industrial fair); 2-days CIMI workshop about Big Data (joint organization with CMI-SID Master, supported by GDR MADICS and MascotNUM): 3 keynotes (MADICS projects) and 10 academic or company talks.
- 2016 : participation to Innovation IT Days; organization of a CIMI scientific seminar “science des données à l’IRIT” (3 IRIT talks, 1 keynote by Patrick Valduriez and 8 workshops about specific data types); organization of the ENADOC-MADICS workshop at IRIT; publication of a special issue of Noir Sur Blanc (IRIT journal) about Data Science (https://www.irit.fr/flipbook/FlipBook_NsB/NsB_Sciences_Donnees.html#p=1 );
- 2017 : organization of a seminar “science des données à l’IRIT” about “data quality, data and decision making” (9 speakers); talk during an ESOF seminar about “Big Data and health”; inquiry to map IRIT research people, groups, theses and projects related to big data and IA; contribution to the mapping project launched by RTRA STAE in relation with the “Data economy” initiative. `
- 2018 : organization of a seminar “Deep Learning à l’IRIT” (talks from 9 IRIT groups and 10 PhD. posters presented research using or bearing on machine learning); we hosted of a swedish delegation from the AI WASP (Wallenberg AI, Autonomous Systems and Software Program) program (4 talks, 30 PhD. posters); in may 2018 started the DataNoos scientific alliance funded by RTRA STAE (19 labs and data centers from Toulouse) and led by IRIT to structure the academic community in Toulouse and promote open science in keeping with RDA recommendations.
- 2019 : CIMI thematic semester about Optimization; several answers to the ANR Flash call for projects about Data (2 come from DataNoos groups); organization of the ODIM seminar (Ontologies, données et informatique médicale) with the Health DAS (2 keynotes : M. Musen from Stanford and J.F. Ethier from Sherbrooke Univ. and 6 talks from IRIT).