Global Convergence of Over-Parameterized Deep Equilibrium Models
Abstract
A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. In this paper, the training dynamics of over-parameterized DEQs are investigated, and we propose a novel probabilistic framework to overcome the challenge arising from the weight-sharing and the infinite depth. By supposing a condition on the initial equilibrium point, we prove that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. We further perform a fine-grained non-asymptotic analysis about random DEQs and the corresponding weight-untied models, and show that the required initial condition is satisfied via mild over-parameterization. Moreover, we show that the unique equilibrium point always exists during the training.
Cite
Text
Ling et al. "Global Convergence of Over-Parameterized Deep Equilibrium Models." Artificial Intelligence and Statistics, 2023.Markdown
[Ling et al. "Global Convergence of Over-Parameterized Deep Equilibrium Models." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/ling2023aistats-global/)BibTeX
@inproceedings{ling2023aistats-global,
title = {{Global Convergence of Over-Parameterized Deep Equilibrium Models}},
author = {Ling, Zenan and Xie, Xingyu and Wang, Qiuhao and Zhang, Zongpeng and Lin, Zhouchen},
booktitle = {Artificial Intelligence and Statistics},
year = {2023},
pages = {767-787},
volume = {206},
url = {https://mlanthology.org/aistats/2023/ling2023aistats-global/}
}