PiKE: Adaptive Data Mixing for Large-Scale Multi-Task Learning Under Low Gradient Conflicts

Li, Zeman; Deng, Yuan; Zhong, Peilin; Razaviyayn, Meisam; Mirrokni, Vahab

PiKE: Adaptive Data Mixing for Large-Scale Multi-Task Learning Under Low Gradient Conflicts

Zeman Li, Yuan Deng, Peilin Zhong, Meisam Razaviyayn, Vahab Mirrokni

NeurIPS 2025

/neurips/2025/li2025neurips-pike/

Abstract

Modern foundation models are trained on diverse datasets to enhance generalization across tasks and domains. A central challenge in this process is determining how to effectively mix and sample data from multiple sources. This naturally leads to a multi-task learning (MTL) perspective. While prior work in MTL has emphasized mitigating gradient conflicts, we observe that large-scale pretraining scenarios—such as multilingual or multi-domain training—often exhibit little to no gradient conflict. Motivated by this observation, we propose $\textbf{PiKE}$ ($\textbf{P}$ositive gradient $\textbf{i}$nteraction-based $\textbf{K}$-task weights $\textbf{E}$stimator), an adaptive data mixing algorithm that dynamically adjusts sampling weights during training. PiKE exploits non-conflicting gradient interactions to minimize a near-tight upper bound on the average loss decrease at each step, while incurring negligible computational overhead. We provide theoretical convergence guarantees and show that PiKE outperforms static and non-adaptive mixing baselines. Furthermore, we extend PiKE to promote balanced learning across tasks. Extensive experiments on large-scale language model pretraining confirm that PiKE achieves faster convergence and improved downstream performance compared to existing approaches.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Li et al. "PiKE: Adaptive Data Mixing for Large-Scale Multi-Task Learning Under Low Gradient Conflicts." Advances in Neural Information Processing Systems, 2025.

Markdown

[Li et al. "PiKE: Adaptive Data Mixing for Large-Scale Multi-Task Learning Under Low Gradient Conflicts." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/li2025neurips-pike/)

BibTeX

@inproceedings{li2025neurips-pike,
  title     = {{PiKE: Adaptive Data Mixing for Large-Scale Multi-Task Learning Under Low Gradient Conflicts}},
  author    = {Li, Zeman and Deng, Yuan and Zhong, Peilin and Razaviyayn, Meisam and Mirrokni, Vahab},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/li2025neurips-pike/}
}