Parallel Scaling Law for Language Models
Abstract
It is commonly believed that scaling language models should commit a significant space or time cost, by increasing the parameters (parameter scaling) or output tokens (inference-time scaling). We introduce another and more inference-efficient scaling paradigm: increasing the model's parallel computation during both training and inference time. We apply $P$ diverse and learnable transformations to the input, execute forward passes of the model in parallel, and dynamically aggregate the $P$ outputs. This method, namely parallel scaling (ParScale), scales parallel computation by reusing existing parameters and can be applied to any model structure, optimization procedure, data, or task. We theoretically propose a new scaling law and validate it through large-scale pre-training, which shows that a model with $P$ parallel streams is similar to scaling the parameters by $\mathcal O(\log P)$ while showing superior inference efficiency. For example, ParScale can use up to 22$\times$ less memory increase and 6$\times$ less latency increase compared to parameter scaling that achieves the same performance improvement. It can also recycle an off-the-shelf pre-trained model into a parallelly scaled one by post-training on a small amount of tokens, further reducing the training budget. The new scaling law we discovered potentially facilitates the deployment of more powerful models in low-resource scenarios, and provides an alternative perspective for the role of computation in machine learning. Our code and 67 trained model checkpoints are publicly available at https://github.com/QwenLM/ParScale and https://huggingface.co/ParScale.
Cite
Text
Chen et al. "Parallel Scaling Law for Language Models." Advances in Neural Information Processing Systems, 2025.Markdown
[Chen et al. "Parallel Scaling Law for Language Models." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/chen2025neurips-parallel/)BibTeX
@inproceedings{chen2025neurips-parallel,
title = {{Parallel Scaling Law for Language Models}},
author = {Chen, Mouxiang and Hui, Binyuan and Cui, Zeyu and Yang, Jiaxi and Liu, Dayiheng and Sun, Jianling and Lin, Junyang and Liu, Zhongxin},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/chen2025neurips-parallel/}
}