Hymba: A Hybrid-Head Architecture for Small Language Models

Abstract

We propose Hymba, a family of small language models featuring a hybrid-head parallel architecture that integrates attention mechanisms and state space models (SSMs) within the same layer, offering parallel and complementary processing of the same inputs. In this hybrid-head module, attention heads provide high-resolution recall, while SSM heads facilitate efficient context summarization. Additionally, we introduce learnable meta tokens, which are prepended to prompts to store critical meta information, guiding subsequent tokens and alleviating the “forced-to-attend” burden associated with attention mechanisms. Thanks to the global context summarized by SSMs, the attention heads in our model can be further optimized through cross-layer key-value (KV) sharing and a mix of global and local attention, resulting in a compact cache size without compromising accuracy. Notably, Hymba achieves state-of-the-art performance among small LMs: Our Hymba-1.5B-Base model surpasses all sub-2B public models and even outperforms Llama-3.2-3B, achieving 1.32\% higher average accuracy, an 11.67$\times$ reduction in cache size, and 3.49$\times$ higher throughput.

Cite

Text

Dong et al. "Hymba: A Hybrid-Head Architecture for Small Language Models." International Conference on Learning Representations, 2025.

Markdown

[Dong et al. "Hymba: A Hybrid-Head Architecture for Small Language Models." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/dong2025iclr-hymba/)

BibTeX

@inproceedings{dong2025iclr-hymba,
  title     = {{Hymba: A Hybrid-Head Architecture for Small Language Models}},
  author    = {Dong, Xin and Fu, Yonggan and Diao, Shizhe and Byeon, Wonmin and Chen, Zijia and Mahabaleshwarkar, Ameya Sunil and Liu, Shih-Yang and Van keirsbilck, Matthijs and Chen, Min-Hung and Suhara, Yoshi and Lin, Yingyan Celine and Kautz, Jan and Molchanov, Pavlo},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/dong2025iclr-hymba/}
}