Averaging on the Bures-Wasserstein Manifold: Dimension-Free Convergence of Gradient Descent
Abstract
We study first-order optimization algorithms for computing the barycenter of Gaussian distributions with respect to the optimal transport metric. Although the objective is geodesically non-convex, Riemannian gradient descent empirically converges rapidly, in fact faster than off-the-shelf methods such as Euclidean gradient descent and SDP solvers. This stands in stark contrast to the best-known theoretical results, which depend exponentially on the dimension. In this work, we prove new geodesic convexity results which provide stronger control of the iterates, yielding a dimension-free convergence rate. Our techniques also enable the analysis of two related notions of averaging, the entropically-regularized barycenter and the geometric median, providing the first convergence guarantees for these problems.
Cite
Text
Altschuler et al. "Averaging on the Bures-Wasserstein Manifold: Dimension-Free Convergence of Gradient Descent." Neural Information Processing Systems, 2021.Markdown
[Altschuler et al. "Averaging on the Bures-Wasserstein Manifold: Dimension-Free Convergence of Gradient Descent." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/altschuler2021neurips-averaging/)BibTeX
@inproceedings{altschuler2021neurips-averaging,
title = {{Averaging on the Bures-Wasserstein Manifold: Dimension-Free Convergence of Gradient Descent}},
author = {Altschuler, Jason and Chewi, Sinho and Gerber, Patrik R and Stromme, Austin},
booktitle = {Neural Information Processing Systems},
year = {2021},
url = {https://mlanthology.org/neurips/2021/altschuler2021neurips-averaging/}
}