Implicit Regularization of AdaDelta

Abstract

We consider the AdaDelta adaptive optimization algorithm on locally Lipschitz, positively homogeneous, and o-minimally definable neural networks, with either the exponential or the logistic loss. We prove that, after achieving perfect training accuracy, the resulting adaptive gradient flows converge in direction to a Karush-Kuhn-Tucker point of the margin maximization problem, i.e. perform the same implicit regularization as the plain gradient flows. We also prove that the loss decreases to zero and the Euclidean norm of the parameters increases to infinity at the same rates as for the plain gradient flows. Moreover, we consider generalizations of AdaDelta where the exponential decay coefficients may vary with time and the numerical stability terms may be different across the parameters, and we obtain the same results provided the former do not approach 1 too quickly and the latter have isotropic quotients. Finally, we corroborate our theoretical results by numerical experiments on convolutional networks with MNIST and CIFAR-10 datasets.

Cite

Text

Englert et al. "Implicit Regularization of AdaDelta." Transactions on Machine Learning Research, 2024.

Markdown

[Englert et al. "Implicit Regularization of AdaDelta." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/englert2024tmlr-implicit/)

BibTeX

@article{englert2024tmlr-implicit,
  title     = {{Implicit Regularization of AdaDelta}},
  author    = {Englert, Matthias and Lazic, Ranko and Semler, Avi},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/englert2024tmlr-implicit/}
}