Linear Regularizers Enforce the Strict Saddle Property
Abstract
Satisfaction of the strict saddle property has become a standard assumption in non-convex optimization, and it ensures that many first-order optimization algorithms will almost always escape saddle points. However, functions exist in machine learning that do not satisfy this property, such as the loss function of a neural network with at least two hidden layers. First-order methods such as gradient descent may converge to non-strict saddle points of such functions, and there do not currently exist any first-order methods that reliably escape non-strict saddle points. To address this need, we demonstrate that regularizing a function with a linear term enforces the strict saddle property, and we provide justification for only regularizing locally, i.e., when the norm of the gradient falls below a certain threshold. We analyze bifurcations that may result from this form of regularization, and then we provide a selection rule for regularizers that depends only on the gradient of an objective function. This rule is shown to guarantee that gradient descent will escape the neighborhoods around a broad class of non-strict saddle points, and this behavior is demonstrated on numerical examples of non-strict saddle points common in the optimization literature.
Cite
Text
Ubl et al. "Linear Regularizers Enforce the Strict Saddle Property." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I8.26194Markdown
[Ubl et al. "Linear Regularizers Enforce the Strict Saddle Property." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/ubl2023aaai-linear/) doi:10.1609/AAAI.V37I8.26194BibTeX
@inproceedings{ubl2023aaai-linear,
title = {{Linear Regularizers Enforce the Strict Saddle Property}},
author = {Ubl, Matthew and Hale, Matthew and Yazdani, Kasra},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2023},
pages = {10017-10024},
doi = {10.1609/AAAI.V37I8.26194},
url = {https://mlanthology.org/aaai/2023/ubl2023aaai-linear/}
}