A Simple Linear Patch Revives Layer-Pruned Large Language Models

Chen, Xinrui; Bai, Haoli; Yuan, Tao; Liu, Ruikang; Zhao, Kang; Yu, Xianzhi; Hou, Lu; Guan, Tian; He, Yonghong; Yuan, Chun

A Simple Linear Patch Revives Layer-Pruned Large Language Models

Xinrui Chen, Haoli Bai, Tao Yuan, Ruikang Liu, Kang Zhao, Xianzhi Yu, Lu Hou, Tian Guan, Yonghong He, Chun Yuan

NeurIPS 2025

/neurips/2025/chen2025neurips-simple/

Abstract

Layer pruning has emerged as a widely used technique for compressing large language models (LLMs). However, existing layer pruning approaches often incur substantial performance degradation. We identify the majority of this degradation to a single yet previously overlooked issue: \textit{the mismatch of activation magnitudes at the pruning interface}. The pre-interface activations exhibit significantly different scales from the post-interface ones, causing the distributional shift as it propagates through the remaining layers. To address this issue, we introduce \textsc{LinearPatch}, a lightweight and plug-and-play technique that fuses two operations into one matrix multiply at the pruning interface: (i) a Hadamard transformation that suppresses massive outliers at particular tokens and (ii) a channel-wise scaling that aligns activation statistics. On LLaMA-3-8B, \textsc{LinearPatch} preserves up to \textbf{94.15\%} of the original model's performance when pruning 5 out of 32 layers, outperforming the previous state of the art by \textbf{4\%}. The patch can be further refined with 5K unlabeled samples via memory-efficient offline distillation, pushing the retention to 95.16\% within only 30 minutes on a single GPU. Code is available at \url{https://github.com/chenxinrui-tsinghua/LinearPatch}.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Chen et al. "A Simple Linear Patch Revives Layer-Pruned Large Language Models." Advances in Neural Information Processing Systems, 2025.

Markdown

[Chen et al. "A Simple Linear Patch Revives Layer-Pruned Large Language Models." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/chen2025neurips-simple/)

BibTeX

@inproceedings{chen2025neurips-simple,
  title     = {{A Simple Linear Patch Revives Layer-Pruned Large Language Models}},
  author    = {Chen, Xinrui and Bai, Haoli and Yuan, Tao and Liu, Ruikang and Zhao, Kang and Yu, Xianzhi and Hou, Lu and Guan, Tian and He, Yonghong and Yuan, Chun},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/chen2025neurips-simple/}
}