Visualizing and Measuring the Geometry of BERT

Abstract

Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces. We find evidence of a fine-grained geometric representation of word senses. We also present empirical descriptions of syntactic representations in both attention matrices and individual word embeddings, as well as a mathematical argument to explain the geometry of these representations.

Cite

Text

Reif et al. "Visualizing and Measuring the Geometry of BERT." Neural Information Processing Systems, 2019.

Markdown

[Reif et al. "Visualizing and Measuring the Geometry of BERT." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/reif2019neurips-visualizing/)

BibTeX

@inproceedings{reif2019neurips-visualizing,
  title     = {{Visualizing and Measuring the Geometry of BERT}},
  author    = {Reif, Emily and Yuan, Ann and Wattenberg, Martin and Viegas, Fernanda B and Coenen, Andy and Pearce, Adam and Kim, Been},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {8594-8603},
  url       = {https://mlanthology.org/neurips/2019/reif2019neurips-visualizing/}
}