Some code for these demos is available:

  1. Feb. 2019. Cosine-similarity penalty to discriminate sound classes in weakly-supervised sound event detection.

  2. January 2017. Bird Audio Detection 2017 challenge: my solution was based on densely connected CNNs (final AUC score: 88.2%, rank: 3rd/30, official result page). Code available at github repo. Paper at EUSIPCO'17 (PDF). Saliency maps can be extracted to retrieve which frequency components were important to take decisions. The spectrograms of the original audio samples can be multiplied to the saliency maps and then put back to the time domain to synthesize "saliency-masked" audio samples. Audio examples available at bird sounds.

  3. Convolutional neural networks are amazing at extracting their own representations of structured data, such as speech. Here is an illustration showing that phonetic categories, such as voiced plosives, voiceless plosives, fricatives, are inferred by the layers of a network trained for phone recognition for French. Click on the image to get an interactive 3-d representation of PCA applied to activations of the first layer of this CNN. (Paper @ INTERSPEECH 2016)

  4. Speaking and eating: demo of a classifier detecting 6 food types and a "no food" type. This task was part of the INTERSPEECH 2015 paralinguistic challenges. The classifier consists of a convolutional neural network with log-Mel spetra as input. Contributor: Valentin Barriere (July 2015).

  5. Demo of a gesture recognition prototype based on HMM. Contributors: Baptiste Angles, Patrice Guyot, Christophe Mangou (June 2014).