Equilibrium Language Models
Abstract
Large Language Models (LLMs) excel across diverse applications but remain impractical for edge deployment due to severe memory bottlenecks at the edge devices. We propose Equilibrium Language Models (ELMs), a novel compression framework that replaces groups of Transformer layers with a lightweight fixed-point network, reinterpreting deep computation as solving for an equilibrium state. To achieve ELMs, We introduce *Group Pruning Policy Optimization*, which automatically learns optimal pruning intervals. Moreover, we propose *One-Step KV-Cache*, which drastically reduces memory overhead by storing only the final iteration cache without compromising the accuracy, to enable effective deployment at the edge devices. Across different tasks such as common sense reasoning, mathematical problem solving, and code generation, ELMs prune 28\% of parameters while retaining 99\% of the accuracy of dense fine-tuned LLMs, establishing a new direction for memory-efficient edge deployment of large models.
Cite
Text
Jiang et al. "Equilibrium Language Models." International Conference on Learning Representations, 2026.Markdown
[Jiang et al. "Equilibrium Language Models." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/jiang2026iclr-equilibrium/)BibTeX
@inproceedings{jiang2026iclr-equilibrium,
title = {{Equilibrium Language Models}},
author = {Jiang, Yikun and Wang, Huanyu and Ding, Tianhong and Zhang, Wenhu and Wu, Yiming and Zhao, Hanbin and Lui, John C.S.},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/jiang2026iclr-equilibrium/}
}