0701, 2009

Shot Boundaries Detection

Jpetiot/ January 7, 2009/ Applications

Context Nearly all methods of audio or video segmentation perform with a priori knowledge. These approaches are based on a spatial-temporal modelling of the content and use decision rules. Currently, it is the only way to reach the semantic quality required by search engines. But only recording collections highly structured, such as broadcast videos of news and sports programmes, and

0701, 2008

Program Boundaries Detection

Jpetiot/ January 7, 2008/ Applications

Overview Very few researches have been done for program boundaries detection on TV broadcast for now. All existing approaches are based on a spatiotemporal modelling of the content and decision rules. Currently, it is the only way to reach the semantic quality required by search engines. But only recording collections following the same structure can benefit from such methods. Furthermore,

0701, 2007

Audio Primary Component Detection

Jpetiot/ January 7, 2007/ Applications

Context The primary component detection is a first step in audiovisual indexation. The objective here is to describe the type of data (speech, music, singing voice), in order to give information such as the parts of the document to transcribe. In our studies, we consider 4 primary components : speech, music, singing voice, and jingles. We construct a detector for

0701, 2007

Audio Video Characters Labelling

Jpetiot/ January 7, 2007/ Applications

Context Many works were carried out on the audiovisual content characterization, and particularly on person detection. The majority of these studies are mono media and allow the detection of a person either by his visual appearance in a frame (like a face) or by his voice. More recent works start video content analysis by integrating both acoustics and visual features,

0701, 2007

Speaker Diarization

Jpetiot/ January 7, 2007/ Applications

Context In the context of audio document indexing and retrieval, speaker diarization is the process which detects speakers turns and regroups those uttered by the same speaker. It is generally based on a first step of segmentation (often preceded by a speech detection phase) that consists in partitioning the regions of speech into segments (each segment must be as long

0701, 2006

Jingle Detection

Jpetiot/ January 7, 2006/ Applications

Context A jingle is a short and repetitive melody, sometimes accompanied by speech, used to announce a change in the audio flux. We are interested in detecting and identifing them because we want to use them to structure and describe audio documents. Our algorithm is based on comparing a given jingle to the audio flux. This comparison is made by

0701, 2006

Language identification

Jpetiot/ January 7, 2006/ Applications

Context The aim of an Automatic Language Identification (ALI) system is to identify the language spoken within a few seconds speech excerpt. The uprising development of communications between humans and between humans and computers renew the interest in such systems. The reactualization of multilingual interactive vocal servers and language characterization in front of audio-visual content indexing are an exemple of

0701, 2005

Style Similarity Measure

Jpetiot/ January 7, 2005/ Applications

Context Based on a similarity matrix, we define a generic measure for video documents, to identify style similarity. We consider that similarity in style relies on the occurrence of common elements between the compared documents, from a production point of view. Those common elements – we call them production invariants – can be characterized by a combination of audiovisual characteristics. They can