Sparse Modern Hopfield Networks

Abstract

Ramsauer et al. (2021) recently pointed out a connection between modern Hopfield networks and attention heads in transformers. In this paper, we extend their framework to a broader family of energy functions which can be written as a difference of a quadratic regularizer and a Fenchel-Young loss (Blondel et al., 2020), parametrized by a generalized negentropy function $\Omega$. By working with Tsallis negentropies, the resulting update rules become end-to-end differentiable sparse transformations, establishing a new link to adaptively sparse transformers (Correia et al., 2019) and allowing for exact convergence to single memory patterns. Experiments on simulated data show a higher tendency to avoid metastable states.

Cite

Text

Martins et al. "Sparse Modern Hopfield Networks." NeurIPS 2023 Workshops: AMHN, 2023.

Markdown

[Martins et al. "Sparse Modern Hopfield Networks." NeurIPS 2023 Workshops: AMHN, 2023.](https://mlanthology.org/neuripsw/2023/martins2023neuripsw-sparse/)

BibTeX

@inproceedings{martins2023neuripsw-sparse,
  title     = {{Sparse Modern Hopfield Networks}},
  author    = {Martins, Andre and Niculae, Vlad and McNamee, Daniel C},
  booktitle = {NeurIPS 2023 Workshops: AMHN},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/martins2023neuripsw-sparse/}
}