Quantifying Overfitting Along the Regularization Path for Two-Part-Code MDL in Supervised Classification

Abstract

We provide a complete characterization of the entire regularization curve of a modified two-part-code Minimum Description Length (MDL) learning rule for binary classification, based on an arbitrary prior or description language. Gr{ü}nwald and Langford (2004) previously established the lack of asymptotic consistency, from an agnostic PAC (frequentist worst case) perspective, of the MDL rule with a penalty parameter of $\lambda=1$, suggesting that it underegularizes. Driven by interest in understanding how benign or catastrophic under-regularization and overfitting might be, we obtain a precise quantitative description of the worst case limiting error as a function of the regularization parameter $\lambda$ and noise level (or approximation error), significantly tightening the analysis of Gr{ü}nwald and Langford for $\lambda=1$ and extending it to all other choices of $\lambda$.

Cite

Text

Zhu and Srebro. "Quantifying Overfitting Along the Regularization Path for Two-Part-Code MDL in Supervised Classification." Proceedings of Thirty Eighth Conference on Learning Theory, 2025.

Markdown

[Zhu and Srebro. "Quantifying Overfitting Along the Regularization Path for Two-Part-Code MDL in Supervised Classification." Proceedings of Thirty Eighth Conference on Learning Theory, 2025.](https://mlanthology.org/colt/2025/zhu2025colt-quantifying/)

BibTeX

@inproceedings{zhu2025colt-quantifying,
  title     = {{Quantifying Overfitting Along the Regularization Path for Two-Part-Code MDL in Supervised Classification}},
  author    = {Zhu, Xiaohan and Srebro, Nathan},
  booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory},
  year      = {2025},
  pages     = {6124-6155},
  volume    = {291},
  url       = {https://mlanthology.org/colt/2025/zhu2025colt-quantifying/}
}