PhD in Computer Science and Telecommunications (2004)
Title: Audio Indexing: Primary Components Retrieval for Audiovisual Structuring
To process the quantity of audiovisual information available in a smart and rapid way,
it is necessary to have robust and automatic tools.
This work addresses the soundtrack indexing and structuring of multimedia documents.
Their goals are to detect the primary components: speech, music and key sounds.
For speech/music classification, three unusual parameters are extracted:
entropy modulation, stationary segment duration (with a Forward-Backward Divergence algorithm)
and the number of segments.
These three parameters are merged with the classical 4 Hertz modulation energy.
Experiments on radio corpora show the robustness of these parameters.
The system is compared and merged with a classical system.
Another partitioning consists in detecting pertinent key sounds.
For jingles, the selection of candidates is done by comparing the "signature" of each jingle with the data flow.
This system is simple, fast and efficient.
Applause and laughter are based on GMM with spectral analysis.
A TV corpus validates this study by encouraging results.
The detection of key words is carried out in a traditional way:
the problem here is not to improve the existing systems but to be in a structuring task:
these key words inform about the program type (news, weather, documentary...).
Through two studies, a reflection is done for the component uses in order to find a temporal structure of the audiovisual documents.
The first study is a detection of a recurring production invariant in program collections.
The second permits to structure TV news into topics.
Some examples of video analysis contribution are developed.
Thesis and presentation (in French)