Inverse Distance Weighting Attention

Abstract

We report the effects of replacing the scaled dot-product (within softmax) attention with the negative-log of Euclidean distance. This form of attention simplifies to inverse distance weighting interpolation. Used in simple one hidden layer networks and trained with vanilla cross-entropy loss on classification problems, it tends to produce a key matrix containing prototypes and a value matrix with corresponding logits. We also show that the resulting interpretable networks can be augmented with manually-constructed prototypes to perform low-impact handling of special cases.

Cite

Text

McCarter. "Inverse Distance Weighting Attention." NeurIPS 2023 Workshops: AMHN, 2023.

Markdown

[McCarter. "Inverse Distance Weighting Attention." NeurIPS 2023 Workshops: AMHN, 2023.](https://mlanthology.org/neuripsw/2023/mccarter2023neuripsw-inverse/)

BibTeX

@inproceedings{mccarter2023neuripsw-inverse,
  title     = {{Inverse Distance Weighting Attention}},
  author    = {McCarter, Calvin},
  booktitle = {NeurIPS 2023 Workshops: AMHN},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/mccarter2023neuripsw-inverse/}
}