0703, 2023

Automated Audio Captioning

Elabbe/ March 7, 2023/ Modeling, Non classé, Research

Context In recent years, new deep learning systems have been significantly improved for text generation, processing and understanding, leading to the use of free-form text as a global interface between humans and machines. In sound event recognition, most of the tasks are using a predefined set of classes, but human natural language can contain much more information, which could improve

0701, 2009

Acoustic-to-articulatory Inversion

Jpetiot/ January 7, 2009/ Modeling

Context The aim of acoustic-to-articulatory inversion is to recover the vocal tract shape, knowing the acoustics pronounced. This recovery is done by estimating the position of flesh points located on lips, tongue, jaw, and sometimes velum. In our case, 6 captors are positionned on: upper lip, lower lip, jaw, front tongue, middle tongue and back tongue. System Overview First, an

0701, 2005

Prosody Modelling

Jpetiot/ January 7, 2005/ Modeling

Context Prosodic features carry a substantial part of the language identity that may be sufficient for humans to perceptually identify some languages. Among these supra-segmental features, intonation is very promising both for linguistic and automatic processing purposes. However, intonation modelling is difficult, both in terms of theoretical definition and automatic processing. We propose here some prosodic modelling approaches that bring

0701, 2002

Differentiated Modeling

Jpetiot/ January 7, 2002/ Modeling

Context In modling theory, classes share the same representation space (parameterization) and the same kind of modeling. But they can be differentiated behind separate representation spaces and distinct statistical models. For example, note that the differences in production between Word and Music are found naturally in the signal nature themselves: the speech presents a formant structure, while the music shows