Unsupervised methods for linguistic data

Aaron Steven White

Language and Computation (Introductory)

Second week, from 11:00 to 12:30

Abstract

This course, which is a prerequisite for the proposed advanced level ESSLLI course "Computational Lexical Semantics," is an introduction to common forms of exploratory analysis used in computational (psycho)linguistics, with special focus on unsupervised techniques such as clustering, mixture models, and matrix factorization. All sessions will be in an interactive tutorial format, using the scipy stack, the Jupyter notebook platform and curated datasets. Participants should have a beginner-level competence with the Python programming language and have taken an introductory statistics class. Some R experience may be useful but is optional.