Context Human motion analysis is a problem which has been addressed in different ways according to various expected goals. Methods using low or high level features such as optical flow have been proposed in the past. Those ones are most of the time dedicated to one specific task such as the recognition of a specific motion and then difficult to
Context One of the most difficult tasks in speech processing is to define limits of the phonetic units present in the signal. Phones are strongly co-articulated and there are no clear borders among them, so the link between the linguistic and the acoustic segmentation is not simple to define. It does not matter which code level is chosen (word, syllable,
Context Similarity between two video documents is a concept which shall take into account their content as well as their structure, in terms of time order. We propose an algorithm performing this kind of comparison and from which we derive various applications such as a generic measure to estimate the “style similarity”, or a segmentation tool to split a long
Context The notion of “Pseudo Syllable” was introduced to analyze language rhythms. This was intended for use in the prosodic module of our Automatic Language Identification system. Its relevance is confirmed by the results obtained on multilingual languages discrimination tasks. The concept can be applied on other research field, and in particular on multlingual system, where language specific syllable segmentation cannot be
Context Some experiments made on automatic video summarization showed that the costume feature is one of the most significant clue for the identification of keyframes belonging to some given excerpts. This property is mainly justified by the fact that costumes are attached to the character function in the video document.The approach proposed here consists first in characterizing the region located
Context The objective of the feature extraction step is to capture the most relevant and discriminate characteristics of the signal to recognize. Although characters in the same class have some variances due to different spoken styles for different users, there must exist some consistencies. That is why feature extraction is needed and those features extracted are put into a classifier.