1906.02715 Visualizing and Measuring the Geometry of BERT



Existing representation: word embeddings

Language is made of discrete structures, yet neural networks operate on continuous data: vectors in high-dimensional space.

A successful language-processing network must translate this symbolic information into some kind of geometric representation—but in what form?

Word embeddings provide two well-known examples: distance encodes semantic similarity, while certain directions correspond to polarities (e.g. male vs. female).

New representation

A recent, fascinating discovery points to an entirely new type of representation.

One of the key pieces of linguistic information about a sentence is its syntactic structure.

This structure can be represented as a tree whose nodes correspond to words of the sentence.

Hewitt and Manning, in A structural probe for finding syntax in word representations, show that several language-processing networks construct geometric copies of such syntax trees.

Words are given locations in a high- dimensional space, and (following a certain transformation)

Euclidean distance between these locations maps to tree distance.