Inverse Distance Weighting Attention
Abstract
We report the effects of replacing the scaled dot-product (within softmax) attention with the negative-log of Euclidean distance. This form of attention simplifies to inverse distance weighting interpolation. Used in simple one hidden layer networks and trained with vanilla cross-entropy loss on classification problems, it tends to produce a key matrix containing prototypes and a value matrix with corresponding logits. We also show that the resulting interpretable networks can be augmented with manually-constructed prototypes to perform low-impact handling of special cases.
Cite
Text
McCarter. "Inverse Distance Weighting Attention." NeurIPS 2023 Workshops: AMHN, 2023.Markdown
[McCarter. "Inverse Distance Weighting Attention." NeurIPS 2023 Workshops: AMHN, 2023.](https://mlanthology.org/neuripsw/2023/mccarter2023neuripsw-inverse/)BibTeX
@inproceedings{mccarter2023neuripsw-inverse,
title = {{Inverse Distance Weighting Attention}},
author = {McCarter, Calvin},
booktitle = {NeurIPS 2023 Workshops: AMHN},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/mccarter2023neuripsw-inverse/}
}