Source-Optimal Training Is Transfer-Suboptimal
Abstract
We prove that training a source model optimally for its own task is generically suboptimal when the objective is downstream transfer. We study the source-side optimization problem in L2-SP (L2-distance to Starting Point) ridge regression, where the target estimator is regularized toward the source model parameters, and show a fundamental mismatch between the source-optimal regularization $\tau_S^*$ (minimizing source risk) and the transfer-optimal regularization $\tau_0^*$ (maximizing downstream transfer): outside of a measure-zero set, $\tau_0^* \neq \tau_S^*$. We characterize $\tau_0^*$ as a function of the normalized task alignment $\rho = \braket{w_0, w_1}/\|w_0\|^2$ and identify an alignment-dependent reversal: with imperfect alignment ($0<\rho<1$), transfer benefits from stronger source regularization, while in super-aligned regimes ($\rho>1$), transfer benefits from weaker regularization. In isotropic settings, whether transfer helps is independent of target sample size and noise. We verify the phase transition in synthetic experiments across overparameterization ratios and covariance structures, and present nonlinear experiments on MNIST, CIFAR-10, and 20 Newsgroups showing that the mismatch persists in standard transfer learning pipelines, with explicit L2-SP fine-tuning closely tracking standard SGD and the target sample-size independence prediction confirmed empirically.
Cite
Text
Hedges. "Source-Optimal Training Is Transfer-Suboptimal." Transactions on Machine Learning Research, 2026.Markdown
[Hedges. "Source-Optimal Training Is Transfer-Suboptimal." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/hedges2026tmlr-sourceoptimal/)BibTeX
@article{hedges2026tmlr-sourceoptimal,
title = {{Source-Optimal Training Is Transfer-Suboptimal}},
author = {Hedges, C. Evans},
journal = {Transactions on Machine Learning Research},
year = {2026},
url = {https://mlanthology.org/tmlr/2026/hedges2026tmlr-sourceoptimal/}
}