ICP: Immediate Compensation Pruning for Mid-to-High Sparsity

Abstract

The increasing adoption of large-scale models under 7 billion parameters in both language and vision domains enables inference tasks on a single consumer-grade GPU but makes fine-tuning models of this scale, especially 7B models, challenging. This limits the applicability of pruning methods that require full fine-tuning. Meanwhile, pruning methods that do not require fine-tuning perform well at low sparsity levels (10%-50%) but struggle at mid-to-high sparsity levels (50%-70%), where the error behaves equivalently to that of semi-structured pruning. To address these issues, this paper introduces ICP, which finds a balance between full fine-tuning and zero fine-tuning. First, Sparsity Rearrange is used to reorganize the predefined sparsity levels, followed by Block-wise Compensate Pruning, which alternates pruning and compensation on the model's backbone, fully utilizing inference results while avoiding full model fine-tuning. Experiments show that ICP improves performance at mid-to-high sparsity levels compared to baselines, with only a slight increase in pruning time and no additional peak memory overhead.

Cite

Text

Luo et al. "ICP: Immediate Compensation Pruning for Mid-to-High Sparsity." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00886

Markdown

[Luo et al. "ICP: Immediate Compensation Pruning for Mid-to-High Sparsity." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/luo2025cvpr-icp/) doi:10.1109/CVPR52734.2025.00886

BibTeX

@inproceedings{luo2025cvpr-icp,
  title     = {{ICP: Immediate Compensation Pruning for Mid-to-High Sparsity}},
  author    = {Luo, Xin and Fu, Xueming and Jiang, Zihang and Zhou, S. Kevin},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {9487-9496},
  doi       = {10.1109/CVPR52734.2025.00886},
  url       = {https://mlanthology.org/cvpr/2025/luo2025cvpr-icp/}
}