Pseudo Syllable

Jpetiot/ janvier 7, 2006/ Analysis


The notion of “Pseudo Syllable” was introduced to analyze language rhythms. This was intended for use in the prosodic module of our Automatic Language Identification system. Its relevance is confirmed by the results obtained on multilingual languages discrimination tasks. The concept can be applied on other research field, and in particular on multlingual system, where language specific syllable segmentation cannot be performed.


From a prosodic point of view, languages differ in their rhythm and intonation. Syllable is a first-rate candidate for rhythm modeling. Unfortunately, segmenting speech in syllables is typically a language-specific mechanism and thus no language independent algorithm can be derived. For this reason, we introduced in [Farinas & Pellegrino, 2001] the notion of “Pseudo Syllable” derived from the most frequent syllable structure in the world, the Consonant-Vowel structure (see the work of Dauer, 1983). In our algorithm, the speech signal is parsed in patterns matching the .CnV. structure, where n is an integer that may be zero and V may result from the merging of consecutive vowel segments.

Pseudo Syllable

Figure 1. Example of Pseudo Syllable parsing on the french utterance “Et la mer est très bonne”.

For example, if the vowel detection algorithm produces the sequence (CCVVCCVCVCCCVCVCCC), it is parsed in the following sequence of 5 pseudo- syllables: (CCV.CCV.CV.CCCV.CV). The last syllable is discarded as it does not contains any vowel, and the two adjacent vocalic segments in the firsrt Pseudo Syllable are merged.

Pseudo-syllable description

Both rhythmic and fundamental frequency features are computed to describe each Pseudo-Syllable. The parameters used are described below.

Rhythmic features

For each pseudo-syllable, three parameters are computed, corresponding respectively to the total consonant cluster duration, the total vowel duration and the complexity of the consonantal cluster. For example, the description for a .CCV. pseudo- sequence is: P(.CCV.)={Dc; Dv; Nc}, where Dc is the total duration of the consonantal segments, Dv is the duration of the vowel segment and Nc is the number of segments in the consonantal cluster (in the above example, Dc = 150ms, Dv = 50 ms and Nc = 3). Such a basic rhythmic parsing is obviously limited, but provides a framework to model rhythm that requires no knowledge on the language rhythmic structure

Fundamental frequency features

The fundamental frequency outlines are used to compute statistics inside of the same pseudo-syllable frontiers than those used for rhythm modeling, in order to model the intonation of each pseudo-syllable. We choose to compute statistics until 4th order (mean, standard deviation, skewness and kurtosis), in order to describe the variations of intonation within a pseudo-syllable.




Main publications

Share this Post