DRAW: Domain Weight Randomization with Bayesian Updating for LLM Pre-Training
Abstract
Optimal pre-training data mixture is pivotal for large language model (LLM) performance, but searching for the best domain weights is computationally expensive. We present Domain Weight Randomization with Bayesian Updating (DRAW), a principled framework treating domain weights as Dirichlet-distributed random variables whose parameters scale with model width. Informative priors are first estimated using proxy models; the main model then refines these using Bayesian inference and parameter scaling, dynamically sampling domain weights during training. Theoretically, DRAW reduces generalization error at a rate $\mathcal{O}(1/\sqrt{n})$ as model width increases, ensuring stable convergence. Empirical results on open-domain corpora and diverse benchmarks show DRAW reliably outperforms fixed and adaptive baselines in both language modeling and downstream tasks, achieving better average and worst-case performance alongside strong robustness. DRAW not only highlights valuable data domains while suppressing noisy ones, but also introduces a scalable and effective mechanism for adaptive data mixing in LLM pre-training, facilitating efficient knowledge transfer from proxy to large models.
Cite
Text
Wang et al. "DRAW: Domain Weight Randomization with Bayesian Updating for LLM Pre-Training." Transactions on Machine Learning Research, 2026.Markdown
[Wang et al. "DRAW: Domain Weight Randomization with Bayesian Updating for LLM Pre-Training." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/wang2026tmlr-draw/)BibTeX
@article{wang2026tmlr-draw,
title = {{DRAW: Domain Weight Randomization with Bayesian Updating for LLM Pre-Training}},
author = {Wang, Ruonan and Qiao, Yongqi and Xie, Zhonglin and Yuan, Kun},
journal = {Transactions on Machine Learning Research},
year = {2026},
url = {https://mlanthology.org/tmlr/2026/wang2026tmlr-draw/}
}