Learning Semi-Structured Sparsity for LLMs via Shared and Context-Aware Hypernetwork

Sun, Lu; Sakuma, Jun

Learning Semi-Structured Sparsity for LLMs via Shared and Context-Aware Hypernetwork

ICLR 2026

/iclr/2026/sun2026iclr-learning/

Abstract

Large Language Models (LLMs) achieve state-of-the-art performance but are costly to deploy in resource-constrained environments. Pruning with $n:m$ semi-structured sparsity reduces computation and enables hardware acceleration, yet existing methods face a trade-off: one-shot approaches are efficient but heuristic, while optimization-based methods are accurate but expensive. We introduce \textbf{HyperPrune}, a resource-efficient framework that directly optimizes $n:m$ sparsity. A lightweight hypernetwork, shared across layers and conditioned on learnable embeddings, generates structured masks in a one-shot, layer-wise manner. \textit{Continual pruning} preserves cross-layer knowledge, and \textit{feature outlier regularization} retains critical activations, unifying the strengths of heuristic and optimization-based methods. Experiments on LLaMA-7B to 70B show state-of-the-art accuracy–sparsity trade-offs on a single A100 GPU, achieving higher efficiency, accuracy, and scalability than prior approaches. HyperPrune offers a practical, scalable, and hardware-friendly solution for structured LLM pruning.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Sun and Sakuma. "Learning Semi-Structured Sparsity for LLMs via Shared and Context-Aware Hypernetwork." International Conference on Learning Representations, 2026.

Markdown

[Sun and Sakuma. "Learning Semi-Structured Sparsity for LLMs via Shared and Context-Aware Hypernetwork." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/sun2026iclr-learning/)

BibTeX

@inproceedings{sun2026iclr-learning,
  title     = {{Learning Semi-Structured Sparsity for LLMs via Shared and Context-Aware Hypernetwork}},
  author    = {Sun, Lu and Sakuma, Jun},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/sun2026iclr-learning/}
}