Gradient Descent Follows the Regularization Path for General Losses

Ziwei Ji, Miroslav Dudík, Robert E. Schapire, Matus Telgarsky

COLT 2020 pp. 2109-2136

/colt/2020/ji2020colt-gradient/

Abstract

Recent work across many machine learning disciplines has highlighted that standard descent methods, even without explicit regularization, do not merely minimize the training error, but also exhibit an \emph{implicit bias}. This bias is typically towards a certain regularized solution, and relies upon the details of the learning process, for instance the use of the cross-entropy loss. In this work, we show that for empirical risk minimization over linear predictors with \emph{arbitrary} convex, strictly decreasing losses, if the risk does not attain its infimum, then the gradient-descent path and the \emph{algorithm-independent} regularization path converge to the same direction (whenever either converges to a direction). Using this result, we provide a justification for the widely-used exponentially-tailed losses (such as the exponential loss or the logistic loss): while this convergence to a direction for exponentially-tailed losses is necessarily to the maximum-margin direction, other losses such as polynomially-tailed losses may induce convergence to a direction with a poor margin.

PDF COLT Semantic Scholar

Cite

Text

Ji et al. "Gradient Descent Follows the Regularization Path for General Losses." Conference on Learning Theory, 2020.

Markdown

[Ji et al. "Gradient Descent Follows the Regularization Path for General Losses." Conference on Learning Theory, 2020.](https://mlanthology.org/colt/2020/ji2020colt-gradient/)

BibTeX

@inproceedings{ji2020colt-gradient,
  title     = {{Gradient Descent Follows the Regularization Path for General Losses}},
  author    = {Ji, Ziwei and Dudík, Miroslav and Schapire, Robert E. and Telgarsky, Matus},
  booktitle = {Conference on Learning Theory},
  year      = {2020},
  pages     = {2109-2136},
  volume    = {125},
  url       = {https://mlanthology.org/colt/2020/ji2020colt-gradient/}
}