0701, 2007

Speaker Diarization

Jpetiot/ janvier 7, 2007/ Applications

Context In the context of audio document indexing and retrieval, speaker diarization is the process which detects speakers turns and regroups those uttered by the same speaker. It is generally based on a first step of segmentation (often preceded by a speech detection phase) that consists in partitioning the regions of speech into segments (each segment must be as long

0701, 2007

Temporal Relation Matrix

Jpetiot/ janvier 7, 2007/ Audiovisual Content Structuring

Context Feature extraction constitutes the first level of audiovisual content analysis. Basic characteristics are available through segmentation results and can be considered as low level descriptors. Our goal is to reach a higher level and propose new sets of descriptors related to relevant semantic and structural events occurring in the audiovisual content. This challenging issue can be addressed in different

0701, 2006

Jingle Detection

Jpetiot/ janvier 7, 2006/ Applications

Context A jingle is a short and repetitive melody, sometimes accompanied by speech, used to announce a change in the audio flux. We are interested in detecting and identifing them because we want to use them to structure and describe audio documents. Our algorithm is based on comparing a given jingle to the audio flux. This comparison is made by

0701, 2006

Language identification

Jpetiot/ janvier 7, 2006/ Applications

Context The aim of an Automatic Language Identification (ALI) system is to identify the language spoken within a few seconds speech excerpt. The uprising development of communications between humans and between humans and computers renew the interest in such systems. The reactualization of multilingual interactive vocal servers and language characterization in front of audio-visual content indexing are an exemple of

0701, 2006

Human motion analysis

Jpetiot/ janvier 7, 2006/ Analysis

Context Human motion analysis is a problem which has been addressed in different ways according to various expected goals. Methods using low or high level features such as optical flow have been proposed in the past. Those ones are most of the time dedicated to one specific task such as the recognition of a specific motion and then difficult to

0701, 2006

Audio Segmentation

Jpetiot/ janvier 7, 2006/ Analysis

Context One of the most difficult tasks in speech processing is to define limits of the phonetic units present in the signal. Phones are strongly co-articulated and there are no clear borders among them, so the link between the linguistic and the acoustic segmentation is not simple to define. It does not matter which code level is chosen (word, syllable,

0701, 2006

Similarity Matrix

Jpetiot/ janvier 7, 2006/ Analysis

Context Similarity between two video documents is a concept which shall take into account their content as well as their structure, in terms of time order. We propose an algorithm performing this kind of comparison and from which we derive various applications such as a generic measure to estimate the “style similarity”, or a segmentation tool to split a long

0701, 2006

Pseudo Syllable

Jpetiot/ janvier 7, 2006/ Analysis

Context The notion of “Pseudo Syllable” was introduced to analyze language rhythms. This was intended for use in the prosodic module of our Automatic Language Identification system. Its relevance is confirmed by the results obtained on multilingual languages discrimination tasks. The concept can be applied on other research field, and in particular on multlingual system, where language specific syllable segmentation cannot be

0701, 2005

Style Similarity Measure

Jpetiot/ janvier 7, 2005/ Applications

Context Based on a similarity matrix, we define a generic measure for video documents, to identify style similarity. We consider that similarity in style relies on the occurrence of common elements between the compared documents, from a production point of view. Those common elements – we call them production invariants – can be characterized by a combination of audiovisual characteristics. They can

0701, 2005

Automatic character Labelling in videos

Jpetiot/ janvier 7, 2005/ Analysis

Context Some experiments made on automatic video summarization showed that the costume feature is one of the most significant clue for the identification of keyframes belonging to some given excerpts. This property is mainly justified by the fact that costumes are attached to the character function in the video document.The approach proposed here consists first in characterizing the region located