Scavenging Hyena: Distilling Transformers into Long Convolution Models
Abstract
The rapid evolution of Large Language Models (LLMs), epitomized by architectures like GPT-4, has reshaped the landscape of natural language processing. This paper introduces a pioneering approach to address the efficiency concerns associated with LLM pre-training, proposing the use of knowledge distillation for cross-architecture transfer. Leveraging insights from the efficient Hyena mechanism, our method replaces attention heads in transformer models by Hyena, offering a cost-effective alternative to traditional pre-training while confronting the challenge of processing long contextual information, inherent in quadratic attention mechanisms. Unlike conventional compression-focused methods, our technique not only enhances inference speed but also surpasses pre-training in terms of both accuracy and efficiency. In the era of evolving LLMs, our work contributes to the pursuit of sustainable AI solutions, striking a balance between computational power and environmental impact.
Cite
Text
Ralambomihanta et al. "Scavenging Hyena: Distilling Transformers into Long Convolution Models." ICML 2024 Workshops: ES-FoMo-II, 2024.Markdown
[Ralambomihanta et al. "Scavenging Hyena: Distilling Transformers into Long Convolution Models." ICML 2024 Workshops: ES-FoMo-II, 2024.](https://mlanthology.org/icmlw/2024/ralambomihanta2024icmlw-scavenging/)BibTeX
@inproceedings{ralambomihanta2024icmlw-scavenging,
title = {{Scavenging Hyena: Distilling Transformers into Long Convolution Models}},
author = {Ralambomihanta, Tokiniaina Raharison and Mohammadzadeh, Shahrad and Islam, Sami Nur and Jabbour, Wassim and Liang, Laurence},
booktitle = {ICML 2024 Workshops: ES-FoMo-II},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/ralambomihanta2024icmlw-scavenging/}
}