MoXCo:How I Learned to Stop Exploring and Love My Local Minima?

Abstract

Deep Neural Networks (DNNs) are well-known for their generalization capabilities despite overparameterization. This is commonly attributed to the optimizer’s ability to find “good” solutions within high-dimensional loss landscapes. However, widely employed adaptive optimizers, such as ADAM, may suffer from subpar generalization. This paper presents an innovative methodology, $\textit{MoXCo}$, to address these concerns by designing adaptive optimizers that not only expedite exploration with faster convergence speeds but also ensure the avoidance of over-exploitation in specific parameter regimes, ultimately leading to convergence to good solutions.

Cite

Text

Singh et al. "MoXCo:How I Learned to Stop Exploring and Love My Local Minima?." NeurIPS 2023 Workshops: M3L, 2023.

Markdown

[Singh et al. "MoXCo:How I Learned to Stop Exploring and Love My Local Minima?." NeurIPS 2023 Workshops: M3L, 2023.](https://mlanthology.org/neuripsw/2023/singh2023neuripsw-moxco/)

BibTeX

@inproceedings{singh2023neuripsw-moxco,
  title     = {{MoXCo:How I Learned to Stop Exploring and Love My Local Minima?}},
  author    = {Singh, Esha and Sabach, Shoham and Wang, Yu-Xiang},
  booktitle = {NeurIPS 2023 Workshops: M3L},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/singh2023neuripsw-moxco/}
}