Global Convergence of Over-Parameterized Deep Equilibrium Models

Abstract

A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. In this paper, the training dynamics of over-parameterized DEQs are investigated, and we propose a novel probabilistic framework to overcome the challenge arising from the weight-sharing and the infinite depth. By supposing a condition on the initial equilibrium point, we prove that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. We further perform a fine-grained non-asymptotic analysis about random DEQs and the corresponding weight-untied models, and show that the required initial condition is satisfied via mild over-parameterization. Moreover, we show that the unique equilibrium point always exists during the training.

Cite

Text

Ling et al. "Global Convergence of Over-Parameterized Deep Equilibrium Models." Artificial Intelligence and Statistics, 2023.

Markdown

[Ling et al. "Global Convergence of Over-Parameterized Deep Equilibrium Models." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/ling2023aistats-global/)

BibTeX

@inproceedings{ling2023aistats-global,
  title     = {{Global Convergence of Over-Parameterized Deep Equilibrium Models}},
  author    = {Ling, Zenan and Xie, Xingyu and Wang, Qiuhao and Zhang, Zongpeng and Lin, Zhouchen},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2023},
  pages     = {767-787},
  volume    = {206},
  url       = {https://mlanthology.org/aistats/2023/ling2023aistats-global/}
}