Amdahl’s Law for LLMs: A Throughput-Centric Analysis of Extreme LLM Quantization

Abstract

The emergence of 1-bit large language models (LLMs) has sparked significant interest, promising substantial efficiency gains through extreme quantization. However, these benefits are inherently limited by the portion of the model that can be quantized. Specifically, 1-bit quantization typically targets only the projection layers, while the attention mechanisms remain in higher precision, potentially creating significant throughput bottlenecks. To address this, we present an adaptation of Amdahl's Law specifically tailored to the LLMs, offering a quantitative framework for understanding the throughput limits of extreme quantization. Our analysis reveals how improvements in quantization can deliver substantial throughput gains, but only to the extent that they address critical throughput-constrained sections of the model. Through extensive experiments across diverse model architectures and hardware platforms, we highlight key trade-offs and performance ceilings, providing a roadmap for future research aimed at maximizing LLM throughput through more holistic quantization strategies.

Cite

Text

Malekar and Zand. "Amdahl’s Law for LLMs: A Throughput-Centric Analysis of Extreme LLM Quantization." Transactions on Machine Learning Research, 2025.

Markdown

[Malekar and Zand. "Amdahl’s Law for LLMs: A Throughput-Centric Analysis of Extreme LLM Quantization." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/malekar2025tmlr-amdahls/)

BibTeX

@article{malekar2025tmlr-amdahls,
  title     = {{Amdahl’s Law for LLMs: A Throughput-Centric Analysis of Extreme LLM Quantization}},
  author    = {Malekar, Jinendra and Zand, Ramtin},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/malekar2025tmlr-amdahls/}
}