Posts tagged "données"

Zone-based Datalake for big data, small data and IoT Data.

neOCampus – IRIT – CNRS , Toulouse University


Big Data, datalake, big data analytics, IoT, data management, data analysis, open-source, open science, web semantic

IoT data is increasingly integrated into the core of today's society. Whether you want to analyze a market or a product or study a specific research area, it is increasingly necessary to integrate IoT data but also combine it with massive data produced internally or externally with Open Data. To have a complete vision, it is necessary to integrate both voluminous fast data and numerous small data. Thus, in order to respond to the Vs of Big Data, we have designed an architecture that allows us to manage the Volumetry, Velocity, Variety and Veracity of data to generate Value. This architecture aims at allowing the simple crossing of data whatever the volume, the type or the rate while emphasizing the security of the data, the valorization of these data through the advanced use of the metadata and the use of these metadata through high added value services.

Scientific goals

- Manage any type of data in large volumes with efficiency

- Create value through adequate data modeling

- Enable cross-analysis of heterogeneous data simply in the Big Data context

Contacts /,,

Project repository

Scientific Paper

DANG, ZHAO, MEGDICHE, RAVAT (2021), A Zone-Based Data Lake Architecture for IoT, Small and Big Data. IDEAS 2021, to appear. (DOI: 10.1145/3472163.3472185 / ISBN : 978-1-4503-8991-4/21/07)

SANDMAN: Anomaly Detection in a Data Stream Issued from Smart Buildings

IRIT and  LMDC , Toulouse University


Anomaly detection, multi-agent system, smart buildings, energy management, data stream

This research project deals with energy efficiency in buildings to mitigate the climate change. Buildings are the highest source of energy consumption worldwide. However, a large part of this energy is wasted, mainly due to poor buildings management. Therefore, being accurately informed about consumptions and detecting anomalies are essential steps to overcome this problem. Currently, some existing software can record, store, archive, and visualize big data such as the ones of a building, a campus, or a city. Yet, they do not provide Artificial Intelligence (AI) able to automatically analyze the streaming data to detect anomalies and send alerts. To improve the energy management, an innovative anomaly detection system should aim at analyzing raw data, detect any kind of anomalies (point, contextual, collective) in an open environment, at large scale. The developed AI system is called SANDMAN (semi-Supervised ANomaly Detection with Multi-AgeNt systems). The system is semi-supervised by an expert of the field who confirms or overturns the feedback of SANDMAN. It processes data in a time constrained manner to detect anomalies as early as possible. SANDMAN is based on the paradigm of self-adaptive multi-agent system. The results show the robustness of the AI regarding the detection of noisy data, of different types of anomalies, and the scaling.  

Scientific goal

Anomalies detection in smart buildings streaming data by a semi-supervised multi-agent system.


Déploiement d’un système de suivi des déplacements et de la pollution sur vélos pour la mise à disposition sécurisée de données atmosphériques.

Cette thèse s’inscrit dans le cadre du projet CLUE : Cycle-based Laboratory for Urban Evolution. Ce projet scientifique vise à équiper une partie des vélos évoluant dans le campus et dans Toulouse d’un ensemble de capteurs afin d’étudier les déplacements des usagers, mais aussi de profiter du réseau de capteurs mobiles ainsi déployé pour collecter des informations sur la pollution atmosphérique sur le campus et plus largement dans la ville.

Objectifs scientifiques

Plus particulièrement, l’objectif de cette thèse s’articule autour des points suivants :

• La collecte d’un jeu de données dans Toulouse (données de mobilité et mesure de polluants atmosphérique) - inexistant à ce jour - et sa mise à disposition.

• Le déploiement d’un noeud de collecte sans fil des informations, grâce à la technologie LoRa (longue portée, basse consommation d’énergie), et la sécurisation des données sensibles (localisation).

• La présentation des données aux différents acteurs/utilisateurs (chercheurs en aérologie, cyclistes, personnes en charge de l’aménagement du campus) :

– Système de contrôle d’accès aux données multi-roles

– Compromis protection de la vie privée/utilisabilité des données

• L’intégration de différents capteurs existants et tests en environnement réel, en particulier pour les capteurs “black carbon” et oxydes d’azote

• Le raffinement et la validation in situ des modèles de diffusion de polluants utilisés en aérologie


- Christophe Bertero (LAAS) : 

- Jean-François Léon (LA) :

- Matthieu Roy (LAAS) :

- Gilles Tredan (LAAS) :


Toward a Data Lake

Context Presentation

neOCampus is a large operation with different kinds of projects and actors. Started in 2013, its goal is to improve the university campus user’s everyday life through data analysis for people, fluid consummation reduction, reduce building environmental footprint, etc.… Overall, it tends to make the campus smarter. All those projects have one common point: data. Including images, sensor logs, administrative data, configurations, we can find every kind of data and each must be stored somewhere.

This project is centered around this problem with a data management system architecture which is the data lake.The conception of this kind of solution must include handling every kind of data and making it possible to follow the life of a data from the input to the usage in a project. It does not only have to store every kind of data, it is needed to know what is stored, where and in the proper format to use it in the easiest way. When a new data has arrived, the system will automatically rawly store it, find the more valuable format, extract information from this data and make this knowledge available for any purpose.

datalake - Vincent-Nam Dang


Data Lake, Data Driven Project, Big Data, Data Management, Data Analysis

Scientific goal

•    To develop a datalake architecture to change the architecture of the data management system in neOCampus.

Contacts, franç,

Stream Analysis and Filtering for Reliability and Post-processing of Sensor Big Data

Context Presentation

Anomaly detection in real fluid distribution applications is a difficult task, especially when we seek to accurately detect different types of anomalies and possible sensor failures. Our case study is based on a real context: sensor data from the SGE (Rangueil campus management and operation service in Toulouse).

We propose an automatic pattern-based method for anomaly detection in time-series called Composition-based Decision Tree (CDT). We use a modified decision tree and Bayesian optimization to avoid manual tuning of hyper-parameters. Our method uses sequences of patterns to identify remarkable points corresponding to multiple anomalies. The compositions of patterns existing into time-series are learned through an internally generated decision tree and then simplified using Boolean algebra to produce intelligible rules.

Our approach automatically generates decision rules for anomaly detection. All our experiments were carried out on real and synthetic data. We show that our method is precise for classifying anomalies compared to other methods. It also generates rules that can be interpreted and understood by experts and analysts, who they can adjust and modify.

Image_IBK - ines ben kraiem


Anomaly detection, Time-series, Machine learning, Classification rules

Scientific goals

•    To detect different types of anomalies observed in real deployment

•    To generate interpretable rules for anomaly detection

•    To use learning methods for anomaly detection on static and continuous data


Interaction Techniques for Situated Data through a Physical Model

Context Presentation

Over the last decades, the amount of data has increased to 29000 Go produced each second. Understanding the data requires tools to transform these numbers, texts and images into concrete representations. The field of data visualization aims to produce data representation to visualize and analyze abstract data. Building, people or vehicles produce a lot of data collected by many sensors. These specific data are related to a physical location (e.g. number of people in a room is related to the room, humidity in a floor is related to the floor, etc.) Bring and display them close from their physical context allow people to make a better representation of the data (Embedded Data Representations, Willet et al. , 2017).

In this project we aim to design interaction techniques to navigate and manipulate the data close to a physical referent. The main goal is to develop a full interactive physical model of the campus endowed with situated data.

3d flat retouche5 - Cabric Florent(1)


Interaction Techniques, Situated Data, Phygital Model, Human Computer Interaction

Scientific goals

Design and evaluation interaction techniques to explore a digital modeldesign and evaluate interaction techniques with situated databuild a physical model of the campus endowed with situated data and interactive capabilities


Conception of Timeline Component for Timed Data Analyzis

Context Presentation

SandFox project is a collaborative project between IRIT and Berger-Levrault company. This project is part of neOCampus initiative. The goal is to find best ways to represent and interact with data. These data are dated, we would like to be able to compare them over different periods.

To do that, we were doing research concerning different existing models of interaction with data. We were looking among those that most closely matched expectations of our collaborators. From these models, we were going to the conception step of low and medium fidelity prototypes. For the selected model, we were choosing a circular representation. This representation allows more visibility to compare several periods of time. We were also able to produce a low fidelity prototype (paper prototype) and a medium fidelity prototype in progress (make on adobe Xd).

In conclusion, we were founding a representation that allows a clear view of data but lacks interactives elements to change building data for another building or interaction modalities which have not yet been clearly defined.

sandfox_timeline_clastres - Flych



Human-Computer Interaction, SandFox, data, interaction, neOCampus, Data Visualization, Data Interaction

Scientific goals

Facilitate the interaction of temporal data from different sources and/or different time periods.


Stream Analysis and Filtering for Reliability and Post-processing of Sensor Big data

Context Presentation

Anomaly detection in real fluid distribution applications is a difficult task, especially, when we seek to accurately detect different types of anomalies and possible sensor failures. Resolving this problem is increasingly important in building management and supervision applications for analysis and supervision. Our case study is based on a real context: sensor data from the SGE (Rangueil campus management and operation service in Toulouse).

We propose CoRP” Composition of Remarkable Points” a configurable approach based on pattern modelling, for the simultaneous detection of multiple anomalies. CoRP evaluates a set of patterns that are defined by users, in order to tag the remarkable points using labels, then detects among them the anomalies by composition of labels. CoRP is evaluated on real datasets of SGE and on state of the art datasets and is compared to classical approaches.


Figure 1: « Anomaly Detection in Sensor Networks »

Scientific Goals

- Detect different types of anomalies observed in real deployment

- Improve the supervision of sensor networks

- Use learning methods for anomaly detection on static and continuous data



neOCampus, Sensor Data, Univariate Time Series, Anomaly Detection, Pattern-based Method


Information modelling for the development of sustainable construction (MINDOC)

Context Presentation

In previous decades, environmental impact control through lifecycle analysis has become a hot topic in various fields. In some countries, such as France, the key figures for energy show that the building sector alone consumes around 45% of the energy produced each year. From this last observation emerged the idea to improve the methods hitherto employed in this field, in particular those related to the exchange of information between the various stakeholders involved throughout the lifecycle of a building. Information is particularly crucial for conducting various studies around the building; for instance, the assessment of the environmental impact of the latter. Concerning information exchange issues, the creation of open standards such as Industry Foundation Classes (IFC) or CityGML, but also semantic web technologies have been widely used to try to overcome it with some success elsewhere. Another striking issue is the heterogeneity between construction product databases. What would be particularly interesting is to know the environmental impact of a building at early phases of its lifecycle. However, there are a number of problems that still do not have solutions. This includes associating Building Information Modelling (BIM) and semantic web technologies with environmental databases to increase the flexibility needed to assess the building's environmental impact throughout its lifecycle.


Figure 1: MINDOC methodology process

Scientific Goals

- Study how information exchange is made within experts during a building lifecycle in order to figure out interoperability gaps ;

- Fill some of the encountered gaps by mean of formalization of building information.

- Combined with the formalization of environmental data on construction products, the latter will enable the introduction of product data at an early stages of the building lifecycle.


Knowledge Modeling & Semantic Reasoning - Merging Ontologies - Decision Support - Building Information Modeling (BIM) - Environmental Databases.


Hybrid IoT: a Multi-Agent System for Persistent Data Accessibility in Smart Cities

Présentation du contexte

La réalité d'un campus intelligent ou plus généralement d'une ville intelligente passe par une observation régulière de l'environnement par des capteurs ad-hoc, afin d’agir dans l’environnement avec des dispositifs automatiques pour améliorer le bien-être des usagers. Ces capteurs permettent d’obtenir une connaissance des activités humaines et des conditions dans lesquelles ces activités sont menées, mais le déploiement d'un grand nombre de capteurs peut être coûteux. Les coûts sont principalement liés à l'installation, la maintenance et les infrastructures de capteurs dans les bâtiments existants. Pour ces raisons, l’objectif de cette thèse vise à réduire ces coûts en utilisant quotidiennement des milliers d’informations partielles et intermittentes provenant de smartphones des usagers du campus de l’Université Toulouse III Paul Sabatier. Ces traitements sont fondés sur une technologie d’Intelligence Artificielle par systèmes multi-agents coopératifs.



Figure 1 : «On utilise les informations des dispositifs intermittents et mobiles pour fournir des estimations précises»

Objectifs scientifiques

- Apprendre à partir de données brutes, imprécises et intermittentes sans feedback.

- Fournir les informations en continu, même en l’absence de données de smartphone des usagers.

- Utiliser une approche hybride de l’Internet des objets qui mixe capteurs réels et capteurs virtuels.

Mots clés

Systèmes multi-agents auto-adaptatifs, fusion de données, apprentissage, smart campus


Davide Andrea Guastella, Valérie Camps, Marie-Pierre Gleizes, {davide.guastella, camps, gleizes}

Back to Top