OutEffHop: A Principled Outlier-Efficient Attention Layer from Dense Associative Memory Models
Abstract
We introduce a principled approach to Outlier-Efficient Attention Layers via associative memory models to reduce outlier emergence in large transformer-based model. Our main contribution is a novel associative memory model that facilitates outlier-efficient associative memory retrievals. This model subsumes the outlier-efficient attention mechanism (`Softmax_1`) as a special case of its memory retrieval process. Methodologically, this enables the introduction of novel outlier-efficient Hopfield layers as powerful alternatives to traditional attention mechanisms, offering superior post-quantization performance. Empirically, we demonstrate the efficacy of the proposed model across large-scale transformer-based and Hopfield-based models, including BERT, OPT, ViT, and STanHop-Net, benchmarking against state-of-the-art methods like `Clipped_Softmax` and `Gated_Attention`. Notably, our method achieves an average reduction of over 22\% in average kurtosis and over 26\% in the maximum infinity norm of model outputs across the four models, without sacrificing model performance after quantization.
Cite
Text
Luo et al. "OutEffHop: A Principled Outlier-Efficient Attention Layer from Dense Associative Memory Models." ICML 2024 Workshops: ES-FoMo-II, 2024.Markdown
[Luo et al. "OutEffHop: A Principled Outlier-Efficient Attention Layer from Dense Associative Memory Models." ICML 2024 Workshops: ES-FoMo-II, 2024.](https://mlanthology.org/icmlw/2024/luo2024icmlw-outeffhop/)BibTeX
@inproceedings{luo2024icmlw-outeffhop,
title = {{OutEffHop: A Principled Outlier-Efficient Attention Layer from Dense Associative Memory Models}},
author = {Luo, Haozheng and Hu, Jerry Yao-Chieh and Chang, Pei-Hsuan and Chen, Hong-Yu and Li, Weijian and Wang, Wei-Po and Liu, Han},
booktitle = {ICML 2024 Workshops: ES-FoMo-II},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/luo2024icmlw-outeffhop/}
}