Compositional Attention: Disentangling Search and Retrieval

Abstract

Multi-head, key-value attention is the backbone of transformer-like model architectures which have proven to be widely successful in recent years. This attention mechanism uses multiple parallel key-value attention blocks (called heads), each performing two fundamental computations: (1) search - selection of a relevant entity from a set via query-key interaction, and (2) retrieval - extraction of relevant features from the selected entity via a value matrix. Standard attention heads learn a rigid mapping between search and retrieval. In this work, we first highlight how this static nature of the pairing can potentially: (a) lead to learning of redundant parameters in certain tasks, and (b) hinder generalization. To alleviate this problem, we propose a novel attention mechanism, called Compositional Attention, that replaces the standard head structure. The proposed mechanism disentangles search and retrieval and composes them in a dynamic, flexible and context-dependent manner. Through a series of numerical experiments, we show that it outperforms standard multi-head attention on a variety of tasks, including some out-of-distribution settings. Through our qualitative analysis, we demonstrate that Compositional Attention leads to dynamic specialization based on the type of retrieval needed. Our proposed mechanism generalizes multi-head attention, allows independent scaling of search and retrieval and is easy to implement in a variety of established network architectures.

Cite

Text

Mittal et al. "Compositional Attention: Disentangling Search and Retrieval." International Conference on Learning Representations, 2022.

Markdown

[Mittal et al. "Compositional Attention: Disentangling Search and Retrieval." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/mittal2022iclr-compositional/)

BibTeX

@inproceedings{mittal2022iclr-compositional,
  title     = {{Compositional Attention: Disentangling Search and Retrieval}},
  author    = {Mittal, Sarthak and Raparthy, Sharath Chandra and Rish, Irina and Bengio, Yoshua and Lajoie, Guillaume},
  booktitle = {International Conference on Learning Representations},
  year      = {2022},
  url       = {https://mlanthology.org/iclr/2022/mittal2022iclr-compositional/}
}