Foreign Language Learning assisted by Artificial Intelligence
Apprentissage des Langues Assisté par Intelligence Artificielle
ALAIA is a joint laboratory between IRIT and Archean Technologies. Funded by the ANR, the French Research Agency, for 3 years, this LabCom started in March 2019. ALAIA is based on the synergy between the SAMoVA team and Archean Labs, R&D department of Archean Technologies.
Main issues and objectives
The core of ALAIA is to put computer technologies and artificial intelligence methods at the service of foreign language learning. The originality of our project is to focus on oral expression, through the evaluation of the quality of utterance pronunciation by foreign language learners, while considering the impact of their mother tongue on the target language. Our main objectives are to:
- rely on the partners’ expertise in order to develop and deploy innovative software in the field of foreign language learning;
- adopt a highly interdisciplinary approach based on the fields of didactics and linguistics, computer research and techniques for interaction with learners;
- integrate the methods from these three domains in the development of software building blocks.
We focus on the Japanese-French language pair; the former as mother tongue (L1) and the latter as target language (L2). The methodology implemented will be applied next to other language pairs. This work relies on the expertise of teachers in foreign language didactics, who are already working with ALAIA’s partners.
Within the framework of the ALAIA innovative program, our first step is to work on phonetic-phonological skills by focusing on the pronunciation of phonemes at word level. Automatic detection, localization and characterization of segmental production errors is our main focus. This relies on:
(1) Speech stimuli in French.
Based on LexPro, this dataset has been developed in collaboration with the University of Waseda (S. Detey) and the University of Pau (F. Hapel and C. Domin). About 2500 statements were recorded (in standard French), orthographically and phonetically transcribed.
(2) Recordings of Japanese learners’ production (in French).
Resulting from former collaborations with the University of Waseda, a first dataset of about 15000 recordings has been valorized in the scope of ALAIA.
(3) Phonetic annotations made by experts
Until now, more than 7300 utterances based on 150 items (distinct words or short sentences) produced by 80 learners, have been manually transcribed at the phoneme level. Thanks to an annotation tool specifically developed, experts produced annotations for about 55000 phonemes including labels related to the type of pronunciation errors encountered. This ongoing work carried out in collaboration with the University of Waseda aims to define a mapping of pronunciation errors at the segmental level.
Thanks to this important dataset, we are also currently working on the automatic analysis and classification of pronunciation errors at the phonetic level. The different steps of our process is summed up in the figure below.
A new industrial PhD (CIFRE) has started in January 2021 to work on linguistic levels in order to focus on lexical and syntactic errors occurring in the production of learners. The aim of this research work is to propose a comprehensibility measure covering both prononciation and linguistic level, applied to more spontaneous utterances which differs from word or sentence repetitions.
Alice Cohen-Hadria worked with us from February 2020 to February 2021 as a post-doc.
Antoine Viette has been working with us since March 2021, as a research engineer, on the automatic process of pronunciation error detection, localization and characterization.
Gautier Arcin has joined us since June 2021, as a research engineer, to work on the plateform on which software modules will be integrated and deployed.
Verdiana De Fino has started to work since January 2021, as an industrial PhD on the definition of a comprehensibility measure.
They work in close collaboration with the steering committee members (see below).
The LabCom governance is organized through two committees :
- Isabelle Ferrané (IRIT – Head) – Lionel Fontan (Archean Labs – Co-Head)
- Julien Pinquier (IRIT) Thomas Pellegrini (IRIT)
- Xavier Aumont: President of Archean Technologies
- Jean-Pierre Jessel: Vice President of Research – University Toulouse III
- Jérôme Lelasseux: Local SATT representative – TTT Toulouse Tech transfer
- Jean-Marc Pierson: Head of IRIT, representing also the head of the INS2I institute of the CNRS
- Charlotte Sicre: IT and liberties referent at IRIT – RGPD correspondent
- Laurent Lacheny: Representative of the ADDOC Agency – Economic development agency of Occitany
- Olivier Baude: Professor of Language Sciences – University of Paris Nanterre – Director of the TGIR Huma-Num
- Sylvain Detey: Professor of Language Sciences, Waseda University – Japan – Expert in language didactics
Publications related to ALAIA
- Estelle Randria, Lionel Fontan, Maxime Le Coz, Isabelle Ferrané, Julien Pinquier Étude des facteurs affectant la compréhensibilité de documents multimodaux : une étude expérimentale, 6e conférence conjointe Journées d’Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 1 : Journées d’Études sur la Parole, 2020, Nancy, France. pp.534-542.
- Estelle I. S. Randria, Lionel Fontan, Maxime Le Coz, Isabelle Ferrané, Julien Pinquier, Subjective Evaluation of Comprehensibility in Movie Interactions. LREC 2020: 2348-2357
- Detey, S., Fontan, L., Le Coz, M., Jmel, S.: Computer assisted assessment of phonetic fluency in second language: a longitudinal study of Japanese learners of French. Speech Communication (2020) 125:69-79.
- Sylvain Detey Waseda University – Japan – Expert in L2 didactics
Funding and Schedule
- ANR Joint Laboratory Program – 3-year project plus 6-month extension due to COVID conditions
- Start time: 1st March 2019 – End of project : 31st August 2021