Open System Categorical Quantum Semantics in Natural Language Processing

Dimitri Kartsaklis

Originally inspired by categorical quantum mechanics, the categorical compositional distributional model of natural language meaning of Coecke, Sadrzadeh and Clark provides a conceptually motivated procedure to compute the meaning of a sentence, given its grammatical structure within a Lambek pregroup and a vectorial representation of the meaning of its parts. This talk discusses an extension of the original model to an open quantum system setting, capable of explicitly handling lexical ambiguity and distinguishing between the two inherently different notions of homonymy and polysemy. This is achieved by using Selinger’s CPM construction, which in practice means a passage from word vectors to density operators representing mixed states. Despite this change of model, standard empirical methods for comparing meanings can be easily adopted, which is demonstrated by a small-scale experiment on real-world data. The enhanced model provides many opportunities for novel theoretical work. For example, commutative classical structures as well as their non-commutative counterparts that arise in the image of the CPM construction allow for encoding relative pronouns, verbs and adjectives. Furthermore, iteration of the CPM construction, something that has no counterpart in the quantum realm, enables one to accommodate both entailment and ambiguity.

Joint work with Piedeleu, Coecke and Sadrzadeh.