The Optimal Ridge Penalty for Real-World High-Dimensional Data Can Be Zero or Negative Due to the Implicit Ridge Regularization
Abstract
A conventional wisdom in statistical learning is that large models require strong regularization to prevent overfitting. Here we show that this rule can be violated by linear regression in the underdetermined $n\ll p$ situation under realistic conditions. Using simulations and real-life high-dimensional datasets, we demonstrate that an explicit positive ridge penalty can fail to provide any improvement over the minimum-norm least squares estimator. Moreover, the optimal value of ridge penalty in this situation can be negative. This happens when the high-variance directions in the predictor space can predict the response variable, which is often the case in the real-world high-dimensional data. In this regime, low-variance directions provide an implicit ridge regularization and can make any further positive ridge penalty detrimental. We prove that augmenting any linear model with random covariates and using minimum-norm estimator is asymptotically equivalent to adding the ridge penalty. We use a spiked covariance model as an analytically tractable example and prove that the optimal ridge penalty in this case is negative when $n\ll p$.
Cite
Text
Kobak et al. "The Optimal Ridge Penalty for Real-World High-Dimensional Data Can Be Zero or Negative Due to the Implicit Ridge Regularization." Journal of Machine Learning Research, 2020.Markdown
[Kobak et al. "The Optimal Ridge Penalty for Real-World High-Dimensional Data Can Be Zero or Negative Due to the Implicit Ridge Regularization." Journal of Machine Learning Research, 2020.](https://mlanthology.org/jmlr/2020/kobak2020jmlr-optimal/)BibTeX
@article{kobak2020jmlr-optimal,
title = {{The Optimal Ridge Penalty for Real-World High-Dimensional Data Can Be Zero or Negative Due to the Implicit Ridge Regularization}},
author = {Kobak, Dmitry and Lomond, Jonathan and Sanchez, Benoit},
journal = {Journal of Machine Learning Research},
year = {2020},
pages = {1-16},
volume = {21},
url = {https://mlanthology.org/jmlr/2020/kobak2020jmlr-optimal/}
}