Iferrane/ janvier 8, 2019/ Previous

Impact of gestures on the pronunciation of foreign language learners

Main issues and objectives


The issue of associating gesture and speech in a second language learning context (L2) emerged from a workshop organized by the project partners involved at different levels in L2 learning. Starting from a first reflection on the technical possibilities of combining gesture production and pronunciation learning, several international academic experts in the field of (1) gesture analysis, (2) language acquisition, (3) link between didactic and digital processing were invited at IRIT in October 2017, to discuss this matter. The company Archean Technologies, involved in language learning software, was also asked to participate to give its industrial point of view on the following questions: “What do we know about the role of gestures while acquiring the pronunciation of a foreign language? How can we study and measure this link? “. That is how the idea of the INGPRO project emerged. Our proposal was submitted and accepted in 2018 as a Regional Council Project (Occitanie).

INGPRO aims at proposing a methodology that can answer these questions, validate the underlying assumptions and integrate the software functionalities into an experimental platform in order to carry out a set of tests for a wide audience of learners. During this two-year project we aimed to demonstrate the feasibility of our innovative proposal.

SAMoVA’s contribution

Our team was mainly involved in the study of speech and gesture coordination. We relied on a preliminary study carried out by the Octogone laboratory (project leader) regarding the gestures performed by L2 teachers in the context of pronunciation learning, especially French. The objective of this study was to produce a typology and a characterization of the gestures used for pronunciation correction. This was followed by a study on the coordination of gesture and speech in situations of phonetic correction [1]. Based on this work, we proposed a methodology for animating a 3D avatar producing a set of corrective gestures associated with their corrective stimulus [2]. Both works are illustrated in a synthetic way below.

Study of Speech and Gesture Coordination

From a set of video recordings of phonetic correction sessions carried out by teachers of FFL (French as a Foreign Language), a manual annotation step of the audio track was done in order to highlight the various phases describing the anatomy of gestures as defined in [McNeill, 2006] and [Bressem, 2011]: rest position, preparation, stroke, hold, retraction, …  as well as the parts of the human body involved in the gesture realization (hands, arms, bust, head, glance, …).

These annotations were aligned on the teacher’s utterances and more particularly on annotations manually or automatically done at the phoneme level. This is illustrated in Figure 1, (from [1]) where the red framework shows the annotation regarding the body part information while the yellow one shows the temporal sequence of the different gesture phases and the green one the annotations available at phoneme level.

Figure 1: Different annotation levels of the teacher’s speech uttered in a corrective purpose: body parts (red); gesture phases for each body part involved (yellow); automatic or manual annotations done at phoneme level (green).

Towards a gesture typology

The temporal decomposition of each type of gestures (as shown in Figure 2) was studied in order to highlight the teachers’ intra and inter personal variability/regularity while performing corrective gestures.

Figure 2: Analysis of the same kind of gestures performed by two different teachers (a) and (b) around the syllable which contains the phoneme to be corrected- Each subfigure corresponds to a temporal window where 0 shows the starting time of the target syllable containing the corrected phoneme )

Modeling gesture types

We designed an experimental protocol to submit pronunciation exercises, in the target foreign language, to different groups of learners. The idea was to carried out a contractive study regarding the incidence of gestures used to strengthen pronunciation correction. We chose to use a 3D avatar as a virtual teacher for two main reasons: (1) first for a better control of the experimental conditions, in particular the type of corrective gestures submitted to the learners (gesture parameters, body parts involved, combination with spoken corrective stimuli, etc. ), (2) second, to reduce teachers’ inter-personal variability and limit the presence of other communication signs produced unconsciously which could influence the learner.

Gesture selection

From the literature and the analysis of teacher video corpora, six different co-verbal gestures were selected. They could be use in an iconic way for suprasegmental correction (illustrating a concrete concept like rising intonation) or in a metaphoric way (illustrating an abstract concept like tension or relaxation). Each type of gesture is illustrated below :

UpwardUpward + DownwardMonotoneRelaxingTensionOpening

Animation of a 3D avatar

The 3D avatar (Figure 3 – left) has to perform a gesture that must be synchronized with the pronunciation of the corrective stimulus (with or without emphasis, as shown in the video below) it is associated with. A tool has been developed [2] to generate animations integrating lip synchronisation (based on phoneme-to-viseme map [Bear, 2017] see Figure 3 – middle) with the associated corrective gesture (Figure 3 – right).

Figure 3: 3D avatar (left) – Facial animation using phoneme to viseme map (middle) – Gesture generation (right)
Example : on the left normal pronunciation – on the right pronunciation with emphasis

Project Members

Since mid-november 2020, Amélie Le Chevanton has joined us as research engineer to further develop the Web platform dedicated to Foreign Language learning. Her first mission was to integrate the avatar in the web service platform used for the experimental sessions organized in spring 2021 (after been delayed because of the COVID period). This tool was designed to collect learners’ speech production and study the impact of gestures performed during phonetic correction [4].

Marie Philippart de Foy et Léonardo Contreras Roa joined us, as part-time post-doctoral students, in the last phase of the project. They analyzed in detail the data collected during the experimental sessions with Japanese-speaking students that took place in May 2021 [5][6]. We warmly thank Corentin Barcat (PhD Tokyo University of Foreign Studies) for his help in collecting the data using the fore-mentioned application.

In the scope of INGPRO, a complementary work was done for transfering gesture generation and alignment with audio to take into account of beat gestures to help visualization of the tonic accent in specialized English vocabulary learning [3].


[McNeill, 2006] David McNeill , GESTURE :A PSYCHOLINGUISTIC APPROACH University of Chicago. For Psycholinguistics Section,The Encyclopedia of Language and Linguistics (2006).

[Bressem, 2011] Jana Bressem & Silva H. Ladewig Rethinking gesture phases : Articulatory features of gestural movement?, Semiotica 2011 (184) :53-91 (2011).

[Bear, 2017] BEAR H. L., HARVEY R. : Phoneme-to-viseme mappings : the good, the bad, and the ugly. Speech Communication. Vol. 95 (Dec 2017), 40–67.

Internship reports

[1] Kellian Ballentine, Speech analysis and gesture coordination for learning french as a foreign language, Master 2 internship report, August 2019.

[2] Franck Donny, Project INGPRO: animation of a 3D avatar, Mater 2 internship report, September 2020.

[3] Fabien Kambu, Vers l’intégration d’un avatar existant dans la plateforme de jeux sérieux dédiée à l’apprentissage de langues étrangères, Master 2 internship report, September 2021.

INGPRO Publications

[4] Charlotte Alazard-Guiu , Nadia Yassine-Diab , Isabelle Ferrané , Lionel Fontan, Julien Pinquier, Thomas Pellegrini , Amélie Le Chevanton, Studying the impact of congruent and incongruent corrective gestures on the acquisition of L2 pronunciation. Onela2021 : ONELA Outils et Nouvelles Explorations de la Linguistique Appliquée 2021/ Instrumentation and New Explorations in Applied linguistics, Toulouse, 19-21 octobre 2021.

[5] Charlotte Alazard-GuiuLeonardo Contreras RoaIsabelle FerranéLionel FontanAmélie Le ChevantonMaxime Le CozMarie Philippart de FoyThomas PellegriniJulien PinquierNadia Yassine-Diab, L’apport du geste dans l’acquisition de la prononciation en L2 via un outil d’apprentissage en ligne : une étude pilote, Journées d’études du GIS Réseau d’acquisition des langues secondes (REAL2 2021), Nov 2021, Paris, France.

[6] Charlotte Alazard-GuiuLeonardo Contreras RoaIsabelle FerranéMarie Philippart de FoyLionel FontanNadia Yassine-DiabJulien Pinquier, Apport du geste dans l’acquisition de la prononciation en L2 : quand la réalité ne correspond pas aux attentes, 34èmes Journées d’Études sur la Parole (JEP 2022), Jun 2022, Noirmoutier, France.




  • Two-year project supported by OCCITANIE Regional Council (+ 9 month extension)


  • Start time: 1st January 2019
  • End time: 30th September 2021 (including a nine-month extension granted by the Regional Council)

Share this Post