Nos partenaires

CNRS

Rechercher





Accueil du site > Français > Evénements > Séminaires

Séminaires

 

L’IRIT étant localisé sur plusieurs sites, ses séminaires sont organisés et ont lieu soit à l’Université Toulouse 3 Paul Sabatier (UT3), l’Université Toulouse 1 Capitole (UT1), l’INP-ENSEEIHT ou l’Université Toulouse 2 Jean Jaurès (UT2J).

 

Usage patterns of non-native language speakers discovered by string kernels for native language identification

Radu TUDOR IONESCU - University of Bucharest (Roumanie)

Jeudi 7 Février 2019, 11h00 - 12h00
UT3 Paul Sabatier, IRIT, Salle du Conseil
Version PDF :

Abstract

Recently, an approach that uses only character p-grams as features has been proposed for the task of native language identification (NLI). The approach obtained state-of-the-art results by combining several string kernels using multiple kernel learning. A broad set of native language identification experiments are presented to compare the string kernels approach with other state-of-the-art methods. The empirical results obtained in the experiments indicate that the proposed approach achieves state-of-the-art performance in NLI. To gain additional insights about the string kernels approach, the features selected by the classifier as being more discriminating are analyzed in this presentation. The analysis also offers information about localized language transfer effects, since the features used by the proposed model are p-grams of various lengths. The features captured by the model typically include stems, function words, word prefixes and suffixes, which have the potential to generalize over purely word-based features. By analyzing the discriminating features, this presentation offers insights into two kinds of language transfer effects, namely word choice (lexical transfer) and morphological differences.

 

Retour