STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

Abstract

In this paper, we present the first structural binarization method for LLM compression to less than 1-bit precision. Although LLMs have achieved remarkable performance, their memory-bound nature during the inference stage hinders the adoption of resource-constrained devices. Reducing weights to 1-bit precision through binarization substantially enhances computational efficiency. We observe that randomly flipping some weights in binarized LLMs does not significantly degrade the model's performance, suggesting the potential for further compression. To exploit this, our STBLLM employs an N:M sparsity technique to achieve structural binarization of the weights. Specifically, we introduce a novel Standardized Importance (SI) metric, which considers weight magnitude and input feature norm to more accurately assess weight significance. Then, we propose a layer-wise approach, allowing different layers of the LLM to be sparsified with varying N:M ratios, thereby balancing compression and accuracy. Furthermore, we implement a fine-grained grouping strategy for less important weights, applying distinct quantization schemes to sparse, intermediate, and dense regions. Finally, we design a specialized CUDA kernel to support structural binarization. We conduct extensive experiments on LLaMA, OPT, and Mistral family. STBLLM achieves a perplexity of 11.07 at 0.55 bits per weight, outperforming the BiLLM by 3×. The results demonstrate that our approach performs better than other compressed binarization LLM methods while significantly reducing memory requirements. Code is released at https://github.com/pprp/STBLLM.

Cite

Text

Dong et al. "STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs." International Conference on Learning Representations, 2025.

Markdown

[Dong et al. "STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/dong2025iclr-stbllm/)

BibTeX

@inproceedings{dong2025iclr-stbllm,
  title     = {{STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs}},
  author    = {Dong, Peijie and Li, Lujun and Zhong, Yuedong and Du, DaYou and Fan, Ruibo and Chen, Yuhan and Tang, Zhenheng and Wang, Qiang and Xue, Wei and Guo, Yike and Chu, Xiaowen},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/dong2025iclr-stbllm/}
}