Implicit Jacobian Regularization Weighted with Impurity of Probability Output

Abstract

The success of deep learning is greatly attributed to stochastic gradient descent (SGD), yet it remains unclear how SGD finds well-generalized models. We demonstrate that SGD has an implicit regularization effect on the logit-weight Jacobian norm of neural networks. This regularization effect is weighted with the impurity of the probability output, and thus it is active in a certain phase of training. Moreover, based on these findings, we propose a novel optimization method that explicitly regularizes the Jacobian norm, which leads to similar performance as other state-of-the-art sharpness-aware optimization methods.

Cite

Text

Lee et al. "Implicit Jacobian Regularization Weighted with Impurity of Probability Output." International Conference on Machine Learning, 2023.

Markdown

[Lee et al. "Implicit Jacobian Regularization Weighted with Impurity of Probability Output." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/lee2023icml-implicit/)

BibTeX

@inproceedings{lee2023icml-implicit,
  title     = {{Implicit Jacobian Regularization Weighted with Impurity of Probability Output}},
  author    = {Lee, Sungyoon and Park, Jinseong and Lee, Jaewook},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {19141-19184},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/lee2023icml-implicit/}
}