The Convex Geometry of Backpropagation: Neural Network Gradient Flows Converge to Extreme Points of the Dual Convex Program
Abstract
We study non-convex subgradient flows for training two-layer ReLU neural networks from a convex geometry and duality perspective. We characterize the implicit bias of unregularized non-convex gradient flow as convex regularization of an equivalent convex model. We then show that the limit points of non-convex subgradient flows can be identified via primal-dual correspondence in this convex optimization problem. Moreover, we derive a sufficient condition on the dual variables which ensures that the stationary points of the non-convex objective are the KKT points of the convex objective, thus proving convergence of non-convex gradient flows to the global optimum. For a class of regular training data distributions such as orthogonal separable data, we show that this sufficient condition holds. Therefore, non-convex gradient flows in fact converge to optimal solutions of a convex optimization problem. We present numerical results verifying the predictions of our theory for non-convex subgradient descent.
Cite
Text
Wang and Pilanci. "The Convex Geometry of Backpropagation: Neural Network Gradient Flows Converge to Extreme Points of the Dual Convex Program." International Conference on Learning Representations, 2022.Markdown
[Wang and Pilanci. "The Convex Geometry of Backpropagation: Neural Network Gradient Flows Converge to Extreme Points of the Dual Convex Program." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/wang2022iclr-convex/)BibTeX
@inproceedings{wang2022iclr-convex,
title = {{The Convex Geometry of Backpropagation: Neural Network Gradient Flows Converge to Extreme Points of the Dual Convex Program}},
author = {Wang, Yifei and Pilanci, Mert},
booktitle = {International Conference on Learning Representations},
year = {2022},
url = {https://mlanthology.org/iclr/2022/wang2022iclr-convex/}
}