Functional Scaling Laws in Kernel Regression: Loss Dynamics and Learning Rate Schedules
Abstract
Scaling laws have emerged as a unifying lens for understanding and guiding the training of large language models (LLMs). However, existing studies predominantly focus on the final-step loss, leaving open whether the entire $\textit{loss dynamics}$ obey similar laws and, crucially, how the $\textit{learning rate schedule}$ (LRS) shapes them. We address these gaps in a controlled theoretical setting by analyzing stochastic gradient descent (SGD) on a power-law kernel regression model. The key insight is a novel $\textbf{intrinsic-time}$ viewpoint, which captures the training progress more faithfully than iteration count. We then establish a $\textbf{Functional Scaling Law (FSL)}$ that captures the full loss trajectory under arbitrary LRSs, with the schedule’s influence entering through a simple convolutional functional. We further instantiate the theory for three representative LRSs---constant, exponential decay, and warmup–stable–decay (WSD)---and derive explicit scaling relations in both data- and compute-limited regimes. These comparisons explain key empirical phenomena: (i) higher-capacity models are more data- and compute-efficient; (ii) learning-rate decay improves training efficiency; and (iii) WSD-type schedules outperform pure decay. Finally, experiments on LLMs ranging from 0.1B to 1B parameters demonstrate the practical relevance of FSL as a surrogate model for fitting and predicting loss trajectories in large-scale pre-training.
Cite
Text
Li et al. "Functional Scaling Laws in Kernel Regression: Loss Dynamics and Learning Rate Schedules." Advances in Neural Information Processing Systems, 2025.Markdown
[Li et al. "Functional Scaling Laws in Kernel Regression: Loss Dynamics and Learning Rate Schedules." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/li2025neurips-functional/)BibTeX
@inproceedings{li2025neurips-functional,
title = {{Functional Scaling Laws in Kernel Regression: Loss Dynamics and Learning Rate Schedules}},
author = {Li, Binghui and Chen, Fengling and Huang, Zixun and Wang, Lean and Wu, Lei},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/li2025neurips-functional/}
}