| Features Extraction | Pseudo Syllable | Generic GLR/BIC Audio-Video Segmentation | Similarity Matrix |
| Automatic Speech Segmentation | Automatic Character Labelling in Video | Human Motion Analysis | Monophony / Polyphony Distinction |
The team possesses an important know-how and expertise in low-level segmentation.
In audio, most of the works use the forward-backward segmentation algorithm. A robust version (adverse environment, language and speaker independent) permits to locate the pertinent information, to extract and use it in various domains:
- In automatic language identification: from the identification of vocalic segments, a new prosodic unit, called the pseudo-syllable, has been defined to characterize the rhythm and the intonation. So, the prosody may be so modeled and introduced in an automatic language identification system, to complete the acoustic and phonetic modeling.
- In automatic speaker verification: the automatic segmentation provides the transient zones which are speaker informative.
- In speech/music detection: the behavior of the segmentation process is quite different in speech and music. The modeling of the segment distribution makes the speech/music discrimination more robust.
In video, most of the analysis are issued from a preliminary segmentation into shots by hard cut detections and dissolve localizations. Some extensions to this tool allow also to analyze compositing effects (overlay detection, split screen localization, and so on). In some cases, a content spatiotemporal representation, called “X-ray” image is performed to obtain a micro-segmentation in homogeneous camera works.