Prédiction frugale de la charge d’un supercalculateur pour réduire son impact énergétique

Keywords prédiction, charge, hpc, ordonnancement, incertitude, énergie Encadrants Millian Poquet, Georges Da Costa Contexte Dans le monde du calcul à haute performance, un supercalculateur est une plateforme de calcul utilisée par de nombreux utilisateurs pour y exécuter des applications, notamment pour lancer des campagnes de simulations scientifiques à grande échelle. Les supercalculateurs récents peuvent avoir un nombre très grand de ressources (de l’ordre du million de cœurs) et les utilisateurs n’accèdent donc pas directement aux ressources ; ils passent par l’intermédiaire d’un gestionnaire de ressources (comme SLURM[1]) pour réserver des nœuds/cœuds de calcul et pour y exécuter des applications.

Replaying with feedback: towards more realistic HPC simulations

Topic Researchers use simulations to compare the performance (execution time, energy efficiency, …) of different scheduling algorithms in High-Performance Computing (HPC) platforms. The most common method is to replay historic workloads recorded in real HPC infrastructures (like the ones available in the Parallel Workloads Archive): jobs are submitted to the simulation at the same timestamp as in the original log. A major drawback of this method is that it does not preserve the submission behavior of the users of the platform.

Game Theory for Green Datacenters

In order to operate a datacenter only with renewable energies, a negotiation has to be undertaken between the sources providing and storing the energy (solar panels, wind turbine, batteries, hydrogen tanks) and the consumers of the energy (basically the IT infrastructure). In the context of the ANR DATAZERO2 project (, a negotiation module has to be improved, starting from a existing proof of concept already published. The improvement will be included in a dedicated module, interoperable with a functioning middleware developed in the project.

Federation of clouds: Multi-Clouds overflow

To cover data analytics needs, Cloud providers need to adapt their IaaS services to resources consumption fluctuations and demands. This requests geographical distribution of tasks excutions and flexible services. Having a federation of cloud providers allows to provide such services to users. In this project, users submit their applications on a cloud broker. The aim is to find resources in one or many clouds to be able to answer the request.

DVFS-aware performance and energy model of HPC applications

Power consumption of computers is becoming an major concern. To optimise their power consumption it is necessary to have precise information on the behavior of applications. With this information, it is possible to choose the right frequency of a processor. The speed of some applications is not really impacted by changes of this frequency, while for some application it has an important effect. The goal of this internship is to model the fine grained behavior of applications and to link this behavior with the impact (on performance and energy) of frequency changes.

Fast scheduling under energy and QoS constraints in a Fog computing environment

Location: LAAS-CNRS - Team SARA or IRIT - Team SEPIA Supervisors: Da Costa Georges ( / Guérout Tom ( Duration: 6 months, possibility of thesis afterwards. Context The explosion of the volume of data exchanged within today’s IT systems, due to an increasingly wide use and by an increasingly wide audience (large organizations, companies, general public etc.), has led for several years to question the architectures used until now. Indeed, for the past few years, Fog computing [1], which extends the Cloud computing paradigm to the edge of the network, has been developing steadily, offering more and more possibilities and thus extending the field of Internet of Things applications.

Impact of processor temperature on HPC application performance and energy conusmption

Large scale datacenters manage applications as black boxes. Most of the time, they assume that application behavior is not linked to the state of the underlying hardware. When an applications runs on a hot processors, it can be slowed down arbitrary by the processor as it tries to protect itself. The goal of this internship is to evaluate the impact of temperature on the speed of the code, the impact of the execution of the code on temperature, and the possibility to reduce the frequency of the processor to cool down the processor at key points to cool down the processor (and thus speed up the application)

Performance and energy models of colocated applications

Large scale datacenters manage applications as black boxes. Most of the time, they assume that applications have no cross impact. When multiple applications are using the memory, their speed is reduces because of the bottleneck of the memory bus. In the other direction, two applications on the same core might not go at half the speed each: if one uses only floting point operations, while the other only memory access for example.

Sufficient cloud: off-grid scheduling for environmentally responsible users

Topic Avoiding the ecological catastrophe will require an joined effort from every actor in the society – the ICT industry included. We postulate that some environmental-aware individuals are willing to reflect upon and reduce the footprint associated to their usage of new technologies. Similarly to the Low-tech Magazine[1], a solar-powered and very lightweight website, this internship will study an off-grid “sufficient”[2] data center in which a part of the users accepts to delay, degrade or even cancel the execution of their tasks to reduce the overall footprint of the infrastructure

Scheduling of malleable HPC applications

The position is offered in the framework of the French ANR-funded ENERGUMEN project and will take place at IRIT laboratory in Toulouse, France. This project aims at proposing and evaluating new scheduling heuristics for malleable/reconfigurable HPC tasks (i.e. able to change the number of resources at runtime), taking into account computing requirements but also data movement that occurs during reconfiguration. We intend to study bi-objective problems using simulation, optimizing both consumed energy and a performance criterion, e.