Associative Memories with Heavy-Tailed Data
Abstract
Learning arguably involves the discovery and memorization of abstract rules. But how associative memories appear in transformer architectures optimized with gradient descent algorithms? We derive precise scaling laws for a simple input-output associative memory model with respect to parameter size, and discuss the statistical efficiency of different estimators, including optimization-based algorithms. We provide extensive numerical experiments to validate and interpret theoretical results, including fine-grained visualizations of the stored memory associations.
Cite
Text
Cabannes et al. "Associative Memories with Heavy-Tailed Data." NeurIPS 2023 Workshops: AMHN, 2023.Markdown
[Cabannes et al. "Associative Memories with Heavy-Tailed Data." NeurIPS 2023 Workshops: AMHN, 2023.](https://mlanthology.org/neuripsw/2023/cabannes2023neuripsw-associative/)BibTeX
@inproceedings{cabannes2023neuripsw-associative,
title = {{Associative Memories with Heavy-Tailed Data}},
author = {Cabannes, Vivien and Dohmatob, Elvis and Bietti, Alberto},
booktitle = {NeurIPS 2023 Workshops: AMHN},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/cabannes2023neuripsw-associative/}
}