The Optimal Ridge Penalty for Real-World High-Dimensional Data Can Be Zero or Negative Due to the Implicit Ridge Regularization

Abstract

A conventional wisdom in statistical learning is that large models require strong regularization to prevent overfitting. Here we show that this rule can be violated by linear regression in the underdetermined $n\ll p$ situation under realistic conditions. Using simulations and real-life high-dimensional datasets, we demonstrate that an explicit positive ridge penalty can fail to provide any improvement over the minimum-norm least squares estimator. Moreover, the optimal value of ridge penalty in this situation can be negative. This happens when the high-variance directions in the predictor space can predict the response variable, which is often the case in the real-world high-dimensional data. In this regime, low-variance directions provide an implicit ridge regularization and can make any further positive ridge penalty detrimental. We prove that augmenting any linear model with random covariates and using minimum-norm estimator is asymptotically equivalent to adding the ridge penalty. We use a spiked covariance model as an analytically tractable example and prove that the optimal ridge penalty in this case is negative when $n\ll p$.

Cite

Text

Kobak et al. "The Optimal Ridge Penalty for Real-World High-Dimensional Data Can Be Zero or Negative Due to the Implicit Ridge Regularization." Journal of Machine Learning Research, 2020.

Markdown

[Kobak et al. "The Optimal Ridge Penalty for Real-World High-Dimensional Data Can Be Zero or Negative Due to the Implicit Ridge Regularization." Journal of Machine Learning Research, 2020.](https://mlanthology.org/jmlr/2020/kobak2020jmlr-optimal/)

BibTeX

@article{kobak2020jmlr-optimal,
  title     = {{The Optimal Ridge Penalty for Real-World High-Dimensional Data Can Be Zero or Negative Due to the Implicit Ridge Regularization}},
  author    = {Kobak, Dmitry and Lomond, Jonathan and Sanchez, Benoit},
  journal   = {Journal of Machine Learning Research},
  year      = {2020},
  pages     = {1-16},
  volume    = {21},
  url       = {https://mlanthology.org/jmlr/2020/kobak2020jmlr-optimal/}
}