MoXCo:How I Learned to Stop Exploring and Love My Local Minima?
Abstract
Deep Neural Networks (DNNs) are well-known for their generalization capabilities despite overparameterization. This is commonly attributed to the optimizer’s ability to find “good” solutions within high-dimensional loss landscapes. However, widely employed adaptive optimizers, such as ADAM, may suffer from subpar generalization. This paper presents an innovative methodology, $\textit{MoXCo}$, to address these concerns by designing adaptive optimizers that not only expedite exploration with faster convergence speeds but also ensure the avoidance of over-exploitation in specific parameter regimes, ultimately leading to convergence to good solutions.
Cite
Text
Singh et al. "MoXCo:How I Learned to Stop Exploring and Love My Local Minima?." NeurIPS 2023 Workshops: M3L, 2023.Markdown
[Singh et al. "MoXCo:How I Learned to Stop Exploring and Love My Local Minima?." NeurIPS 2023 Workshops: M3L, 2023.](https://mlanthology.org/neuripsw/2023/singh2023neuripsw-moxco/)BibTeX
@inproceedings{singh2023neuripsw-moxco,
title = {{MoXCo:How I Learned to Stop Exploring and Love My Local Minima?}},
author = {Singh, Esha and Sabach, Shoham and Wang, Yu-Xiang},
booktitle = {NeurIPS 2023 Workshops: M3L},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/singh2023neuripsw-moxco/}
}