IRIT

The skills of the team SAMoVA (Structuring, Analysis and Modeling of video and audio documents) of IRIT laboratory (UMR 5505 CNRS / Paul Sabatier University / INPT) are located towards the project at several levels:

A part of the team's work are based on the use of an algorithm which has as purpose the segmentation of the speech signal in areas almost stationary. This localization enables more relevant extraction of information in the field of prosodic analysis, an identification of the language or the speaker, a detection of the Speech/Music whatever the enviroment, the noise, the speakers or the language. This segmentation adds a temporal dimension rarely taken into account explicitly in speech. A system detecting the speech and the music, developed by Julien Pinquier, takes advantage of this segmentation and uses discriminant parameters such as the modulation of energy at 4Hz and the modulation of the entropy. This efficient system gives hope its adaptation to the context of noisy recordings DIADEMS project will be of a great interest.

A system of speaker diarization has been recently developped in the team by El Khoury:the GLR-BIC segmentation allows the detection of short (duration of half a second) segments reliably unlike the majority of other systems. The clustering is based on the EVSM method (Eigen Vector Space Models).

A system detecting songs has been developped by IRIT. The method is based on the exploitation of the Vibrato concept . His presence is quantified through a parameter called "extended vibrato" and a simple thresholding which allows getting a system detecting very powerful songs developed by Hélène Lachambre. The evaluation of the method improved the performance by adapting the calculation of the variable and the decision threshold to the context sound "a capella voice" or “voice on instruments”. A very original detection system of polyphony has been also developed and will be provided to the project and used as pre-treatment to the instrumental study.

A method based on Gaussian mixture model has allowed the modelling of sounds such as applause. This type of learning will serve as a basis for recognition of the background noise. A system of detection and identification of audio jingles on the radio was also been developed by Julien Pinquier. The current approach will be taken and adapted to characterize more specific noises (such as surround sound).