Thomas Pellegrini/ septembre 23, 2020/ Current

Lightly-supervised and Unsupervised Discovery of Audio Units using Deep Learning (ANR-18-CE23-0005-01)

Main issues and objectives

     These last years, deep learning (DL) has become the state-of-the-art machine learning (ML) paradigm when applied in supervised settings to data with a latent structure, in image, video, audio and speech processing. In order to deploy deep learning solutions in problems for which little or even no annotated data are available, there is a growing research area on lightly-supervised and unsupervised settings in DL, with methods such as one-shot learning in object categorization in images. This interest also exists in audio processing, but to a much smaller extent. LUDAU will remediate to this situation by exploring the very powerful properties of DNNs, namely feature and multi-level representation learning, combined with state-of-the-art clustering techniques. LUDAU is a proposal to explore and strengthen this hot research topic in the framework of DL and deep neural networks (DNNs).

Two scenarios will be targeted:

  1) a lightly-supervised scenario in which coarse manual labels are available. By coarse labels, we refer to labels that globally describe a recording.

  2) a zero-resource or unsupervised scenario when only raw audio recordings are available.

The main motivation behind LUDAU is to minimize the need for manual labeling effort by involving coarse labels, and then, hopefully, to even remove the need for coarse labeling. To reach this goal, we plan to: 1) propose new methods to extract feature representations to better discriminate between audio units, 2) segment and cluster the audio signal representations into useful and meaningful elementary units.


  • Thomas Pellegrini (scientific coordinator)
  • Julien Pinquier
  • Sandrine Mouysset
  • Mathieu Serrurier
  • Léo Cancès (PhD candidate, started 01/10/2018, funded by the project)
  • Rasa Lileikyte (Post-doc, 02/10/2019 – 17/09/2020, funded by the project)
  • Erwan Gateau-Magdeleine, intern, real-time speech recognition for art performances, 03/06/2019-13/09/2019
  • Thomas Hustache, intern, Data augmentation for Sound Event Detection, 03/06/2019-13/09/2019


  • Research project “Jeune Chercheur (JCJC)” by l’Agence Nationale Recherche, ANR-18-CE23-0005-01, Appel à Projet Générique 2018, section CE23 (Données, Connaissances, Big data,Contenus multimédias, Intelligence
  • 222 k€


  • Start time : 1st October 2018
  • End time : 31th March 2022 (duration: 42 months)

Publications and communications


  • T. Pellegrini, R. Zimmer, T. Masquelier. Low-activity supervised convolutional spiking neural networks applied to speech commands recognition. To appear in Proc. IEEE Spoken Language Technology Workshop, Shenzhen, Jan. 2021



  • Oral communication “Introdução às redes neuronais para reconhecimento de fala “end-to-end””,  ISCTE-IUL, Lisbon (remote), 13-11-2020
  • Oral communication “Deep learning with weakly-annotated data: a sound event detection use case (and hate speech detection here and there)”, Workshop on Machine Learning for Trend and Weak Signal Detection in Social Networks and Social Media, Toulouse, 27-02-2020


  • L. Cances, P. Guyot, T. Pellegrini. Evaluation of  Post-Processing Algorithms for  Polyphonic Sound Event  Detection. In Proc. IEEE WASPAA, New Paltz, Oct. 2019 
  • Manifestation art-science “Turing Test” – labellisée 80 ans du CNRS : Participation à un “bord de scène” autour de la pièce de théâtre Turing Test de la compagnie NoKill, sur le sujet de l’IA. Université Paul Sabatier, 18/11/2019
  • A. Heba, T. Pellegrini, J.-P. Lorré, R. André-Obrecht. Char+CV-CTC: combining graphemes and consonant/vowel units for CTC-based ASR using Multitask Learning. In Proc. Interspeech, Graz, Sept. 2019
  • C. Gendrot, E. Ferragne, T. Pellegrini. Deep learning and voice comparison: phonetically-motivated vs. automatically-learned features. In Proc. International Congress of Phonetic Sciences (ICPhS 2019)Melbourne, IPA : International Phonetic Association, Aug. 2019
  • E. Ferragne, T. Pellegrini, C. Gendrot. Towards phonetic interpretability in deep learning applied to voice comparison. In Proc. International Congress of Phonetic Sciences (ICPhS 2019), Melbourne, IPA : International Phonetic Association, Aug. 2019
  • T. Pellegrini, L. Cances. Cosine-similarity penalty to discriminate sound classes in weakly-supervised sound event detection. In Proc. International Joint Conference on Neural Networks (IJCNN 2019), Budapest, 14/07/2019-19/07/2019, INNS : International Neural Network Society, July 2019
  • T. Rolland, A. Basarab, T. Pellegrini. Label-consistent sparse auto-encoders. In Proc. Workshop on Signal Processing with Adaptative Sparse Structured Representations (SPARS 2019), Toulouse, July 2019


  • L. Cances, T. Pellegrini, P. Guyot. Sound event detection from weak annotations: weighted-GRU versus multi-instance-learning. In Proc. IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and EventsSurrey, Tampere University of Technology, pp. 64-68, Nov. 2018   

Share this Post