Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities
Abstract
Learning a Gaussian mixture model (GMM) is a fundamental statistical problem. One common notion of learning a GMM is proper learning: here, the goal is to find a mixture of $k$ Gaussians $\mathcal{M}$ that is close to the unknown density $f$ from which we draw samples. The distance between $\mathcal{M}$ and $f$ is often measured in the total variation / $L_1$-distance. Our main result is an algorithm for learning a mixture of $k$ univariate Gaussians that is \emphnearly-optimal for any fixed $k$. It is well known that the sample complexity of properly learning a univariate $k$-GMM is $O(k / ε^2)$. However, the best prior \emphrunning time for this problem is $\widetilde{O}(1 / ε^3k-1)$; in particular, the dependence between $1/ε$ and $k$ is exponential. In this paper, we significantly improve this dependence by replacing the $1/ε$ term with $\log 1/ε$, while only increasing the exponent moderately. Specifically, the running time of our algorithm is $(k ⋅\log 1/ ε)^O(k^4) + \widetilde{O}(k / ε^2)$. For any fixed $k$, the $\widetilde{O}(k / ε^2)$ term dominates our running time, and thus our algorithm runs in time which is \emphnearly-linear in the number of samples drawn. Achieving a running time of $\text{poly}(k, 1 / ε)$ for proper learning of $k$-GMMs has recently been stated as an open problem by multiple researchers, and we make progress on this question. Our main algorithmic ingredient is a new connection between proper learning of parametric distributions and systems of polynomial inequalities. We leverage results for piecewise polynomial approximation of GMMs and reduce the learning problem to a much smaller sub-problem. While tihs sub-problem is still non-convex, its size depends only logarithmically on the final accuracy $ε$. Hence we can invoke computationally expensive methods for solving the sub-problem. We show that our connection is also useful in the multivariate setting, where we get new results for learning a mixture of two spherical Gaussians. A variant of our approach is also within reach of modern computer algebra systems. Experiments for learning a 2-GMM show promising results: our algorithm improves over the popular Expectation-Maximization (EM) algorithm in the noisy setting.
Cite
Text
Li and Schmidt. "Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities." Proceedings of the 2017 Conference on Learning Theory, 2017.Markdown
[Li and Schmidt. "Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities." Proceedings of the 2017 Conference on Learning Theory, 2017.](https://mlanthology.org/colt/2017/li2017colt-robust/)BibTeX
@inproceedings{li2017colt-robust,
title = {{Robust and Proper Learning for Mixtures of Gaussians via Systems of Polynomial Inequalities}},
author = {Li, Jerry and Schmidt, Ludwig},
booktitle = {Proceedings of the 2017 Conference on Learning Theory},
year = {2017},
pages = {1302-1382},
volume = {65},
url = {https://mlanthology.org/colt/2017/li2017colt-robust/}
}