Audio Primary Component Detection

Jpetiot/ January 7, 2007/ Applications

Context

The primary component detection is a first step in audiovisual indexation. The objective here is to describe the type of data (speech, music, singing voice), in order to give information such as the parts of the document to transcribe. In our studies, we consider 4 primary components : speech, music, singing voice, and jingles. We construct a detector for each componant, and we look for methods which don’t need learning phase.

System overview

Our system is composed of 4 independent detectors, one for each component. The global schema is given below on figure 1.

Figure 1: Global schema for primary component retrieval

In our studies, we want to build detectors which are applicable with any document. So, for each detector, we look for descriptors which are document-independent.

Music detector

First of all, the signal is segmented, using the Forward-Backward segmentation. The descriptors are the number of segments per second, and the length of the longest segments. For more details, click here.

Speech detector

For speech detection, the descriptors are the 4 Hz modulation of energy, which represents the 4Hz syllabic rate of speech, the modulation of entropy, which represents the fact that speech is a desordered signal, and the harmonic coefficient. For more details, click here.

Singing voice detector

The detection is based on the presence of vibrato. However, in order to be able to apply this descriptor to polyphonies, we have a previous pseudo-temporal segmentation. For more details, click here.

Jingle detector

The jingle detector is based on the computation of a spectral distance between a “reference jingle” and the audio document. For more details, click here.

Post-Processing

Finally, a post-processing step is necessary, in order to smooth the results. At this step, we use two simple rules:

Singing=Singing\Speech

Music=Music U Singing

Contributors

Main Publications

Hélène Lachambre, Régine André-Obrecht, Julien Pinquier. Singing Voice Characterization for Audio Indexing. In : European Signal and Image Processing Conference (EUSIPCO 2007), Poznan, Poland, 03/09/2007-07/09/2007, Polish Society for Theoretical and Applied Electrical Engineering (PTETiS), 2007.

Julien Pinquier, Régine André-Obrecht. Audio Indexing: Primary Components Retrieval – Robust Classification in Audio Documents. In : Multimedia Tools and Applications, Springer-Verlag, Vol. 30 N. 3, p. 313-330, september 2006.