# Introduction to Categorical Compositional Distributional Semantics

### Dimitrios Kartsaklis, Martha Lewis

### Language and Computation (Introductory)

### Second week, from 14:00 to 15:30, room G

## Abstract

We present an introductory course on the emerging field of *categorical compositional distributional semantics* (informally referred to as the DisCo model). Inspired by quantum protocols, the DisCo model has provided a convincing account of compositionality in vector space models of NLP, unifying the two orthogonal paradigms of formal semantics and distributional models of meaning. The resulting setting has systematically extended the vector space models from words to sentences, enabling them to reason about sentence meaning with the same tools as for word meaning. Based on the rigorous mathematical framework of compact closed categories, the model has made possible novel approaches in language-related problems, and allowed the theoretical study of compositional aspects in distributional models of meaning. This course is designed to provide a comprehensive introduction of the field to students and researchers, covering mathematical and linguistic foundations, past and current research, and discussing advanced topics and open problems.

## Description

### Motivation

Formal semantics and distributional semantics each offer successful but orthogonal ways of understanding meaning. Formal semantics takes the meanings of words as given, and shows how the meaning of phrases, sentences, and other word combinations can be computed from the meanings of their constituents. In contrast, distributional semantics provides a computationally tractable and philosophically motivated way of determining the meanings of words from their use in written and spoken language, but does not give a means of combining these to get semantic representations of larger text constituents, such as phrases and sentences. Unifying these two approaches is an area of active research, using a range of different techniques from general linear algebra approaches to deep neural networks.

The programme of *categorical compositional distributional semantics* (informally referred to as the DisCo model), originally introduced in (Coecke et al., 2010), uses the mathematical framework of category theory to elucidate crucial structural similarities between formal symbolic approaches to language, and the vector space structure used by a distributional account of semantics. This framework allows a very general description of the way that compositionality can be applied to the distributional programme, unlike some approaches which focus just on certain grammatical constructions such as adjective-noun or noun-verb combinations, or neural network approaches which make no structural distinctions between different grammatical types. Furthermore, the structures that are used in the DisCo framework turn out to be identical to structures that explain the behaviour of quantum-mechanical systems. The link between physics and language has made possible a unique perspective in approaching language-related problems, such as lexical ambiguity (Piedeleu et al., 2015) or entailment (Balkir, 2015). Mathematical structures such as Frobenius algebras and bialgebras have been used to allow the explication of functional words such as relative pronouns (Sadrzadeh et al., 2013), to model linguistic aspects such as coordination (Kartsaklis, 2016) and intonation (Kartsaklis and Sadrzadeh, 2015), and to provide accounts of quantification in distributional models (Sadrzadeh, 2016). The underlying semantic description can be generalized from a vector space to any category which has the requisite compact closed structure, such as set and relations (Marsden, 2016), density matrices (Piedeleu et al., 2015) or conceptual spaces (Bolt et al., 2016). All along the way, the diagrammatic calculus (Coecke and Kissinger,2016) of categorical quantum mechanics simplifies the computations thereof, allowing for a depiction of the flow meaning within sentences, using similar methods as those used for quantum protocols such as teleportation.

On the practical side, the DisCo framework has produced results in the area of compositional distributional semantics which have outperformed other algebraic approaches (Kartsaklis et al., 2014; Kartsaklis and Sadrzadeh, 2013; Grefenstette and Sadrzadeh, 2011), and has been computationally applied to a number of NLP tasks, such as disambiguation, sentence and phrase similarity, and textual entailment. On the theoretical side, its rigid mathematical foundations provide a test-bed for studying compositional aspects of language at a level deeper than most practically-oriented approaches would allow. The topic attracts a number of submissions in journals and annual NLP, computer science and logic conferences and workshops. The recent workshop on "Semantic Spaces at the Intersection of NLP, Physics and Cognitive Science", which was co-located with the 2016 Conference in Quantum Physics and Logic (QPL), provided a forum for researchers working on the DisCo model and fostered theoretically motivated approaches to understanding how meanings of words interact with each other in sentences and discourse.

### Outline of the course

In this course we will provide an introduction to categorical compositional distributional semantics (the DisCo model). The first two sessions will cover introductory material for the course and the basics of the model. We will give the foundational mathematics needed for the course, namely aspects of category theory and linear algebra. The diagrammatic calculus of compact closed categories constitutes one of our main tools for demonstrating the various concepts, and as such is going to be presented in the first lecture. We then go on to introduce the DisCo model, giving the grammatical formalism and a description of a vector space model of semantics. The grammatical formalism used in the course is that of *pregroup grammar*, although other grammars may also be used, and we go on to show how a pregroup grammar can be understood as a compact closed category. On the linguistic side, we explain the connections of pregroup grammars with categorial grammars, and we show why in their simpler forms these two formalisms are interchangeable. We will then describe in detail the understanding of finite-dimensional vector spaces, such as are used in computational linguistics, as a compact closed category. This common structure allows us to map the grammatical reductions of the pregroup grammar to linear maps in the vector space. This forms the kernel of the DisCo framework.

The latter three lectures of the course will show how the core of the model can be extended to include more phenomena and to describe a number of aspects of languages use. We continue by giving a description of Frobenius algebras, and how these can be interpreted in language. We will lay out how Frobenius algebras have been used to model a number of aspects of language, such as relative pronouns, intonation, coordination, and non-compositional aspects of language. The DisCo framework utilises quantum-theoretic structures, and we go on to describe how specific tools from quantum theory -namely, density operators- may be used to address lexical ambiguity and entailment in language. Finally, we introduce a series of advanced topics in the field which each extend or generalise the DisCo framework.

### Syllabus

**Lecture 1:**Introduction to category theory - Compact closed categories - Diagrammatic calculus - Compositional and distributional models of meaning - Unifying the two paradigms*Introduction.***Lecture 2:**Pregroup grammars - The category*Categorical Compositional Distributional Semantics.***FdVect**- A syntax-to-semantics passage - A multi-linear model - Detailed examples**Lecture 3:**Introduction to Frobenius Algebras - Merging and Copying in language - Applications: Type-raising, relative pronouns, intonation, coordination, non-compositional compounds*Frobenius Algebras: Role and Applications*.**Lecture 4:**Short introduction to Quantum Mechanics - Vectors as states - The CPM construction - Density operators for ambiguity - Density operators for entailment*A Quantum Perspective: From Vectors to Density Operators.***Lecture 5:**Quantification - Logic - Dual Density Operators - Quantum Algorithms - Applications to Conceptual Spaces - Conclusions**Advanced Topics, Future Work and Conclusions.**

### Assignments and exercises

We will provide two sets of exercises to allow participants to check their understanding and to get hands-on practical experience in using the DisCo framework. These will involve linguistic examples such as deriving a parse tree with the correct pregroup types to short phrases and sentences, and translating to the vector space setting. Further, exercises will involve computing sentence meanings with toy examples from a short corpus specifically designed for the course.

The first set of exercises will be made available after the first two lectures, covering this introductory material. The second set will be made available after the fourth lecture, and will cover extensions and more advances topics of the DisCo model.

### Prerequisites

Some familiarity with basic category theory and linear algebra would be helpful but not necessary. Knowledge on advanced topics, such as quantum theory, is not a requirement.

### Recommended reading

**Key publications**

- Clark, S., Coecke, B., Sadrzadeh, M. (2008). A Compositional Distributional Model of Meaning
- Coecke, B., Sadrzadeh, M., Clark, S. (2010). Mathematical Foundations for a Compositional Distributional Model of Meaning

**Extensions and implementations**

- Grefenstette, E., Sadrzadeh, M., (2011). Experimental Support for a Categorical Compositional Distributional Model of Meaning.
- Kartsaklis, D., Sadrzadeh, M., Pulman, S., Coecke, B. (2013). Reasoning about Meaning in Natural Language with Compact Closed Categories and Frobenius Algebras.
- Sadrzadeh, M., Clark, S., Coecke, B. (2013) The Frobenius Anatomy of Relative Pronouns
- Piedeleu, R., Kartsaklis, D., Coecke, B., Sadrzadeh, M. (2015) Open System Quantum Semantics for Natural Language Processing