Scaling up Parameter Generation: A Recurrent Diffusion Approach
Abstract
Parameter generation has long struggled to match the scale of today's large vision and language models, curbing its broader utility. In this paper, we introduce Recurrent Diffusion for Large-Scale Parameter Generation (RPG), a novel framework that generates full neural network parameters—up to hundreds of millions—on a single GPU. Our approach first partitions a network's parameters into non-overlapping 'tokens', each corresponding to a distinct portion of the model. A recurrent mechanism then learns the inter-token relationships, producing 'prototypes' which serve as conditions for a diffusion process that ultimately synthesizes the full parameters. Across a spectrum of architectures and tasks—including ResNets, ConvNeXts and ViTs on ImageNet-1K and COCO, and even LoRA-based LLMs—RPG achieves performance on par with fully trained networks while avoiding excessive memory overhead. Notably, it generalizes beyond its training set to generate valid parameters for previously unseen tasks, highlighting its flexibility in dynamic and open-ended scenarios. By overcoming the longstanding memory and scalability barriers, RPG serves as a critical advance in 'AI generating AI', potentially enabling efficient weight generation at scales previously deemed infeasible.
Cite
Text
Wang et al. "Scaling up Parameter Generation: A Recurrent Diffusion Approach." Advances in Neural Information Processing Systems, 2025.Markdown
[Wang et al. "Scaling up Parameter Generation: A Recurrent Diffusion Approach." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/wang2025neurips-scaling/)BibTeX
@inproceedings{wang2025neurips-scaling,
title = {{Scaling up Parameter Generation: A Recurrent Diffusion Approach}},
author = {Wang, Kai and Tang, Dongwen and Zhao, Wangbo and Schürholt, Konstantin and Wang, Zhangyang and You, Yang},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/wang2025neurips-scaling/}
}