Can Compressed LLMs Truly Act? an Empirical Evaluation of Agentic Capabilities in LLM Compression

Abstract

Post-training compression reduces the computational and memory costs of large language models (LLMs), enabling resource-efficient deployment. However, existing compression benchmarks focus narrowly on language modeling (e.g., perplexity) and natural language understanding tasks (e.g., GLUE accuracy), ignoring the agentic capabilities—workflow, tool use/function call, long-context understanding and real-world application. We introduce the Agent Compression Benchmark (ACBench), the first comprehensive benchmark for evaluating how compression impacts LLMs’ agentic abilities. ACBench spans (1) 12 tasks across 4 capabilities (e.g., WorfBench for workflow generation, Needle-in-Haystack for long-context retrieval), (2) 4-bit quantization (GPTQ, AWQ) and 50% pruning (Wanda, SparseGPT), and (3) 15 models, including small (Gemma-2B), standard (Qwen2.5-7B), and distilled reasoning LLMs (DeepSeek-R1-Distill). Our experiments reveal compression tradeoffs: 4-bit quantization preserves workflow generation and tool use (1%–3% drop) but degrades real-world application accuracy by 10%–15%. We introduce ERank, Top-k Ranking Correlation and Energy to systematize analysis. ACBench provides actionable insights for optimizing LLM compression in agentic scenarios, bridging the gap between algorithmic efficiency and real-world applicability.

Cite

Text

Dong et al. "Can Compressed LLMs Truly Act? an Empirical Evaluation of Agentic Capabilities in LLM Compression." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Dong et al. "Can Compressed LLMs Truly Act? an Empirical Evaluation of Agentic Capabilities in LLM Compression." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/dong2025icml-compressed/)

BibTeX

@inproceedings{dong2025icml-compressed,
  title     = {{Can Compressed LLMs Truly Act? an Empirical Evaluation of Agentic Capabilities in LLM Compression}},
  author    = {Dong, Peijie and Tang, Zhenheng and Liu, Xiang and Li, Lujun and Chu, Xiaowen and Li, Bo},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {14169-14202},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/dong2025icml-compressed/}
}