Quantifying Overfitting Along the Regularization Path for Two-Part-Code MDL in Supervised Classification
Abstract
We provide a complete characterization of the entire regularization curve of a modified two-part-code Minimum Description Length (MDL) learning rule for binary classification, based on an arbitrary prior or description language. Gr{ü}nwald and Langford (2004) previously established the lack of asymptotic consistency, from an agnostic PAC (frequentist worst case) perspective, of the MDL rule with a penalty parameter of $\lambda=1$, suggesting that it underegularizes. Driven by interest in understanding how benign or catastrophic under-regularization and overfitting might be, we obtain a precise quantitative description of the worst case limiting error as a function of the regularization parameter $\lambda$ and noise level (or approximation error), significantly tightening the analysis of Gr{ü}nwald and Langford for $\lambda=1$ and extending it to all other choices of $\lambda$.
Cite
Text
Zhu and Srebro. "Quantifying Overfitting Along the Regularization Path for Two-Part-Code MDL in Supervised Classification." Proceedings of Thirty Eighth Conference on Learning Theory, 2025.Markdown
[Zhu and Srebro. "Quantifying Overfitting Along the Regularization Path for Two-Part-Code MDL in Supervised Classification." Proceedings of Thirty Eighth Conference on Learning Theory, 2025.](https://mlanthology.org/colt/2025/zhu2025colt-quantifying/)BibTeX
@inproceedings{zhu2025colt-quantifying,
title = {{Quantifying Overfitting Along the Regularization Path for Two-Part-Code MDL in Supervised Classification}},
author = {Zhu, Xiaohan and Srebro, Nathan},
booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory},
year = {2025},
pages = {6124-6155},
volume = {291},
url = {https://mlanthology.org/colt/2025/zhu2025colt-quantifying/}
}