ADAPT – Assistance to the Analysis and DiAgnosis of PaThological speech
1 Context and objectives
The ADAPT project (« Aide à l’Analyse et au DiAgnostic de la Parole pathologique pour les Thérapeutes ») is based on the PATY platform (Plateforme de traitement de Parole ATYpique, site: https://paty.irit.fr/demo/), developed in 2021 by the SAMoVA Team (Figure 1).
Figure 1- PATY platform
PATY is an online audio processing platform whose objective is to provide automatic speech transcription results that are as complete as possible (recognition and transcription of phonemes and words, as well as confidence scores).
While current models on PATY perform well on typical (healthy) speech, the first challenge of this project concerns the recognition of pathological speech, especially speech after ENT cancer. Cancer and its treatments alter the speech abilities, degrading articulation and limiting the intelligibility of these patients. Current ASR systems are not adapted to cancer speech and show poor performance in the recognition of words produced by patients (with nearly 85% errors in word recognition, Balaguer, 2021). The use of adapted tools, allowing a faster and more reliable analysis of post-cancer speech would allow to best adjust therapeutic strategies to specific speech deficits, and to best meet patients’ needs (Middag et al., 2014). Automatic analysis based on new ASR models adapted to post-cancer speech (Gelin et al., 2021; Wang et al., 2015) and supported by PATY would address the limitations related to the perceptual assessment usually performed (Pommée et al., 2021), especially regarding measurement of variability.
Moreover, to analyze speech in an “ecological” context, i.e., as close as possible to everyday speech production, it is necessary to consider the interlocutor in the communication situation. The analysis of spontaneous speech is complex due to the presence of more than one speaker (patient and one or more speakers, Balaguer et al., 2020) in the speech recording. The addition of speaker recognition systems to ASR systems would allow recognition to be targeted to the speaker of interest, in this case the patient (i.e., improving recognition of patient speech by targeting their speech in particular). The pyannote.audio tool (https://github.com/pyannote/pyannote-audio) meets these needs, but it requires access to a GPU to process the data in a time compatible with current clinical practice (the use of a CPU – i.e. on a standard user’s computer – makes the processing of files extremely time-consuming). Hosting pyannote.audio in PATY would meet the needs of speaker recognition and practical use.
A second challenge concerns the use of the PATY platform’s outputs by the target users, i.e. clinicians or researchers, many of whom are not familiar with the use of automatic tools. It would be relevant to propose new output files and formats, for example TextGrid files for phoneme alignment (and not only at the word level as PATY currently allows). Moreover, to allow users belonging to different communities (clinicians, researchers in linguistics, computer scientists…) to exploit the results, the addition of options to choose the content of the output files (words only, words with time intervals without the associated confidence score, confidence score only for statistical analysis…) would be useful for the specific needs of each user. Finally, the implementation of speaker recognition systems (such as pyannote.audio) would also allow to merge the results of speech recognition and speaker recognition to obtain, in a single file, which speaker says what, and at what time.
The objective of ADAPT is therefore to enrich and develop the infrastructure of the PATY platform, by adding new features (such as speaker diarization) and adapting the output interface to make it more easily usable for clinicians, who are not used to automatic processing and the manipulation of computer tools.
Laboratoire Informatique d’Avignon : Corinne Fredouille
3 People involved in the SAMoVA team
- Mathieu Balaguer (scientific coordinator)
- Hervé Bredin
- Jérôme Farinas
- Julien Pinquier
Institut Carnot Cognition, 24 k€
Start time: 1st January 2023
End time: 31th December 2023
Balaguer, M. (2021). Mesure de l’altération de la communication par analyses automatiques de la parole spontanée après traitement d’un cancer oral ou oropharyngé [Université Toulouse III Paul Sabatier]. http://www.theses.fr/2021TOU30109
Middag, C., Clapham, R., Van Son, R., & Martens, J. P. (2014). Robust automatic intelligibility assessment techniques evaluated on speakers treated for head and neck cancer. Computer Speech and Language, 28(2), 467–482. https://doi.org/10.1016/j.csl.2012.10.007
Pommée, T., Balaguer, M., Mauclair, J., Pinquier, J., & Woisard, V. (2021). Assessment of adult speech disorders: current situation and needs in French-speaking clinical practice. Logopedics Phoniatrics Vocology, 0(0), 1–15. https://doi.org/10.1080/14015439.2020.1870245
Balaguer, M., Pommée, T., Farinas, J., Pinquier, J., Woisard, V., & Speyer, R. (2020). Effects of oral and oropharyngeal cancer on speech intelligibility using acoustic analysis: Systematic review. Head & Neck, 42(1), 111–130. https://doi.org/10.1002/hed.25949
Gelin, L., Daniel, M., Pinquier, J., & Pellegrini, T. (2021). End-to-end acoustic modelling for phone recognition of young readers. Speech Communication, 134, 71–84. https://doi.org/10.1016/j.specom.2021.08.003
Wang, D., & Zheng, T. F. (2015). Transfer learning for speech and language processing. 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 1225–1237. https://doi.org/10.1109/APSIPA.2015.7415532