Prosody Modeling Differentiated Modeling Acoustic-to-articulatory Inversion
Prosody Modelling Differentiated Modeling  Acoustic-to-articulatory Inversion


The studied methods of classification and data mining derive from both the generative and discriminative approaches. Generally, they refer to a supervised learning. Even if “Hidden Markov Models” (HMM) remain the main framework in which the team develops its own models and classifiers, some new approaches have been investigated:

  • To exploit the parameter correlation, Dynamic Bayesian Networks have been studied. They lead to a more confidence and robustness than HMM.
  • A new model issued from SVM has been proposed to take into account numerous databases and to process observation vector sequences of variable size. A new kernel between pairs of sequences has been theoretically studied in a SVM scheme and it has been implemented for automatic speaker verification.
  • A multi-level human model has been proposed to analyze human motion, without prior knowledge about the video source. The proposed model is decomposed in three hierarchical levels, each of them corresponding to a resolution level. Current developments concern the hierarchical decomposition and the matching process handling through each model levels. This is done in an appropriate way to deal with spatial and temporal constraints, and to take into account dynamic invariant aspects in human motion.
  • As the information sources are very often multiple (inside a media or cross media), the integration method becomes a strategic key. Generally, multimodal integration methods are usually classified as decision fusion (or late fusion) and early fusion. To overcome the classical combination of weighted scores or the obvious concatenation of the observation vectors, several strategies have been studied in a formal methodology: they rely on confidence index (for classes, experts and observations) and on probabilistic and uncertainty theories. First experiments have been performed for automatic language identification where acoustic, phonotactic and prosodic information are merged.