Moment-Based Uniform Deviation Bounds for $k$-Means and Friends

Abstract

Suppose $k$ centers are fit to $m$ points by heuristically minimizing the $k$-means cost; what is the corresponding fit over the source distribution? This question is resolved here for distributions with $p\geq 4$ bounded moments; in particular, the difference between the sample cost and distribution cost decays with $m$ and $p$ as $m^{\min\{-1/4, -1/2+2/p\}}$. The essential technical contribution is a mechanism to uniformly control deviations in the face of unbounded parameter sets, cost functions, and source distributions. To further demonstrate this mechanism, a soft clustering variant of $k$-means cost is also considered, namely the log likelihood of a Gaussian mixture, subject to the constraint that all covariance matrices have bounded spectrum. Lastly, a rate with refined constants is provided for $k$-means instances possessing some cluster structure.

Cite

Text

Telgarsky and Dasgupta. "Moment-Based Uniform Deviation Bounds for $k$-Means and Friends." Neural Information Processing Systems, 2013.

Markdown

[Telgarsky and Dasgupta. "Moment-Based Uniform Deviation Bounds for $k$-Means and Friends." Neural Information Processing Systems, 2013.](https://mlanthology.org/neurips/2013/telgarsky2013neurips-momentbased/)

BibTeX

@inproceedings{telgarsky2013neurips-momentbased,
  title     = {{Moment-Based Uniform Deviation Bounds for $k$-Means and Friends}},
  author    = {Telgarsky, Matus J and Dasgupta, Sanjoy},
  booktitle = {Neural Information Processing Systems},
  year      = {2013},
  pages     = {2940-2948},
  url       = {https://mlanthology.org/neurips/2013/telgarsky2013neurips-momentbased/}
}