Features Extraction Pseudo Syllable Generic GLR/BIC Audio-Video Segmentation Similarity Matrix
Features Extraction Pseudo Syllable Generic GLR/BIC Audio-Video Segmentation Similarity Matrix

 

Audio Segmentation Automatic Character Labelling in Video Human Motion Analysis Monophony / Polyphony Distinction
Audio Segmentation Automatic Character Labelling in Videos Human Motion Analysis Monophony / Polyphony Distinction

 

Rhythm Estimation Deformable Non Deformable Object Classification Multiple Sources Detection Choir Detection
Rhythm Estimation Deformable Non Deformable Object Classification Multiple Sources Detection Choir Detection

 

Spectral cover Segmentation in singer turns Characterizing Pathological Voices
Spectral cover Segmentation in singer turns Characterizing Pathological Voices

 

The team possesses an important know-how and expertise in low-level segmentation.

In audio, most of the works use the forward-backward segmentation algorithm. A robust version (adverse environment, language and speaker independent) permits to locate the pertinent information, to extract and use it in various domains:

  • In automatic language identification: from the identification of vocalic segments, a new prosodic unit, called the pseudo-syllable, has been defined to characterize the rhythm and the intonation. So, the prosody may be so modeled and introduced in an automatic language identification system, to complete the acoustic and phonetic modeling.
  • In automatic speaker verification: the automatic segmentation provides the transient zones which are speaker informative.
  • In speech/music detection: the behavior of the segmentation process is quite different in speech and music. The modeling of the segment distribution makes the speech/music discrimination more robust.

In video, most of the analysis are issued from a preliminary segmentation into shots by hard cut detections and dissolve localizations. Some extensions to this tool allow also to analyze compositing effects (overlay detection, split screen localization, and so on). In some cases, a content spatiotemporal representation, called “X-ray” image is performed to obtain a micro-segmentation in homogeneous camera works.