0701, 2009

Singing Voice Detection

Jpetiot/ janvier 7, 2009/ Applications

Context This research takes place in a context of audio indexing. After some work on the detection of speech and music, the problem of the position of singing appears. Actually, it is music produced by human voice. In our Speech/Music system, it is classifed mainly in the music category, but it was sometimes taken for speech. The purpose of this work

0701, 2009

High Level Feature Extraction

Jpetiot/ janvier 7, 2009/ Applications

Context Most of the existing video-search engines rely on context and textual metadata such as the title of the video, tags and comments written by users, etc. In other words, no attempt at understanding the actual content of the video is performed. Content-based audiovisual analysis aims at bridging this so-called semantic gap. Overview Our approach to this problem aims at being as generic,

0701, 2009

Shot Boundaries Detection

Jpetiot/ janvier 7, 2009/ Applications

Context Nearly all methods of audio or video segmentation perform with a priori knowledge. These approaches are based on a spatial-temporal modelling of the content and use decision rules. Currently, it is the only way to reach the semantic quality required by search engines. But only recording collections highly structured, such as broadcast videos of news and sports programmes, and

0701, 2009

Monophony / Polyphony Distinction

Jpetiot/ janvier 7, 2009/ Analysis

Context In many fields of music analysis (for example: source separation, instruments recognition,…), it could be usefull to know how many instruments are present, or how many notes are played at the same time. We propose here a method for this last problem. Here, a “monophonic” sound is defined as one note played at a time (either played by an

0701, 2009

Generic GLR/BIC Audio-Video Segmentation

Jpetiot/ janvier 7, 2009/ Analysis

Context We make the hypothesis that basic video or audio features present homogeneous values depending on a special context: homogeneity can be exploited by a GLR-BIC segmentation algorithm. The homogeneity criterion is evaluated by the ability to describe this feature values with a Gaussian law.This method consists in applying the GLR algorithm until convergence to the best repartition of Gaussian

0701, 2009

Acoustic-to-articulatory Inversion

Jpetiot/ janvier 7, 2009/ Modeling

Context The aim of acoustic-to-articulatory inversion is to recover the vocal tract shape, knowing the acoustics pronounced. This recovery is done by estimating the position of flesh points located on lips, tongue, jaw, and sometimes velum. In our case, 6 captors are positionned on: upper lip, lower lip, jaw, front tongue, middle tongue and back tongue. System Overview First, an

0701, 2008

Program Boundaries Detection

Jpetiot/ janvier 7, 2008/ Applications

Overview Very few researches have been done for program boundaries detection on TV broadcast for now. All existing approaches are based on a spatiotemporal modelling of the content and decision rules. Currently, it is the only way to reach the semantic quality required by search engines. But only recording collections following the same structure can benefit from such methods. Furthermore,

0701, 2008

Adaptative User-Defined Similarity Measure

Jpetiot/ janvier 7, 2008/ Audiovisual Content Structuring

Context With the aim of audiovisual database consulting, without being limited to a predefined applicative context, the prospect of a user-dependent interactive visual organization should be enviable: with S a same small subset of documents, a user should have the possibility to explore several geographical representations of it; the rest of the corpus (or a part of it) has to reorganize itself regarding

0701, 2007

Audio Video Characters Labelling

Jpetiot/ janvier 7, 2007/ Applications

Context Many works were carried out on the audiovisual content characterization, and particularly on person detection. The majority of these studies are mono media and allow the detection of a person either by his visual appearance in a frame (like a face) or by his voice. More recent works start video content analysis by integrating both acoustics and visual features,

0701, 2007

Audio Primary Component Detection

Jpetiot/ janvier 7, 2007/ Applications

Context The primary component detection is a first step in audiovisual indexation. The objective here is to describe the type of data (speech, music, singing voice), in order to give information such as the parts of the document to transcribe. In our studies, we consider 4 primary components : speech, music, singing voice, and jingles. We construct a detector for