Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

Abstract

We challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input. To this end, we introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time. Our approach uses self-attention to reason about relationships between datapoints explicitly, which can be seen as realizing non-parametric models using parametric attention mechanisms. However, unlike conventional non-parametric models, we let the model learn end-to-end from the data how to make use of other datapoints for prediction. Empirically, our models solve cross-datapoint lookup and complex reasoning tasks unsolvable by traditional deep learning models. We show highly competitive results on tabular data, early results on CIFAR-10, and give insight into how the model makes use of the interactions between points.

Cite

Text

Kossen et al. "Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning." Neural Information Processing Systems, 2021.

Markdown

[Kossen et al. "Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/kossen2021neurips-selfattention/)

BibTeX

@inproceedings{kossen2021neurips-selfattention,
  title     = {{Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning}},
  author    = {Kossen, Jannik and Band, Neil and Lyle, Clare and Gomez, Aidan N and Rainforth, Thomas and Gal, Yarin},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/kossen2021neurips-selfattention/}
}