Beyond Gaussians: Spectral Methods for Learning Mixtures of Heavy-Tailed Distributions
Abstract
We study the problem of learning mixtures of distributions, a natural formalization of clustering. A mixture of distributions is a collection of distributions D = D1, . . . , DT and weights w1, . . . , wT . A sample from a mixture is drawn by selecting Di with probability wi and then selecting a sample from Di. The goal, in learning a mixture, is to learn the parameters of the distributions comprising the mixture, given only samples from the mixture. In this paper, we focus on learning mixtures of heavy-tailed product distributions, which was studied by [DHKS05]. The challenge in learning such mixtures is that the techniques developed for learning mixture-models, such as spectral methods and distance concentration, do not apply. The previous algorithm for this problem was due to [DHKS05], which achieved performance comparable to the algorithms of [AM05, KSV05, CR08] given a mixture of Gaussians, but took time exponential in the dimension. We provide an algorithm which has the same performance, but runs in polynomial time. Our main contribution is an embedding which transforms a mixture of heavy-tailed product distributions into a mixture of distributions over the hypercubein a higher dimension, while still maintaining separability. Combining this embedding with standard spectral techniques results in algorithms that can learn mixtures of heavy-tailed distributions with separation comparable to the guarantees of [DHKS05]. Our algorithm runs in time polynomial in the dimension, number of clusters, and imbalance in the weights.
Cite
Text
Chaudhuri and Rao. "Beyond Gaussians: Spectral Methods for Learning Mixtures of Heavy-Tailed Distributions." Annual Conference on Computational Learning Theory, 2008.Markdown
[Chaudhuri and Rao. "Beyond Gaussians: Spectral Methods for Learning Mixtures of Heavy-Tailed Distributions." Annual Conference on Computational Learning Theory, 2008.](https://mlanthology.org/colt/2008/chaudhuri2008colt-beyond/)BibTeX
@inproceedings{chaudhuri2008colt-beyond,
title = {{Beyond Gaussians: Spectral Methods for Learning Mixtures of Heavy-Tailed Distributions}},
author = {Chaudhuri, Kamalika and Rao, Satish},
booktitle = {Annual Conference on Computational Learning Theory},
year = {2008},
pages = {21-32},
url = {https://mlanthology.org/colt/2008/chaudhuri2008colt-beyond/}
}