Lightly-supervised and Unsupervised Discovery of Audio Units
using Deep Learning

Main issues and objectives

These last years, deep learning (DL) has become the state-of-the-art machine learning (ML) paradigm when applied in supervised settings to data with a latent structure, in image, video, audio and speech processing. In order to deploy deep learning solutions in problems for which little or even no annotated data are available, there is a growing research area on lightly-supervised and unsupervised settings in DL, with methods such as one-shot learning in object categorization in images. This interest also exists in audio processing, but to a much smaller extent. LUDAU will remediate to this situation by exploring the very powerful properties of DNNs, namely feature and multi-level representation learning, combined with state-of-the-art clustering techniques. LUDAU is a proposal to explore and strengthen this hot research topic in the framework of DL and deep neural networks (DNNs).

Two scenarios will be targeted:

1) a lightly-supervised scenario in which coarse manual labels are available. By coarse labels, we refer to labels that globally describe a recording.

2) a zero-resource or unsupervised scenario when only raw audio recordings are available.

The main motivation behind LUDAU is to minimize the need for manual labeling effort by involving coarse labels, and then, hopefully, to even remove the need for coarse labeling. To reach this goal, we plan to: 1) propose new methods to extract feature representations to better discriminate between audio units, 2) segment and cluster the audio signal representations into useful and meaningful elementary units.

Core Team


  • Research project "Jeune Chercheur (JCJC)" by l'Agence Nationale Recherche, Appel à Projet Générique 2018, section CE23 (Données, Connaissances, Big data,Contenus multimédias, Intelligence
  • 222 k€


  • Start time : 1st October 2018
  • End time : 31th March 2022 (duration: 42 months)