Uniform Deviation Bounds for K-Means Clustering

Abstract

Uniform deviation bounds limit the difference between a model’s expected loss and its loss on an empirical sample uniformly for all models in a learning problem. In this paper, we provide a novel framework to obtain uniform deviation bounds for loss functions which are unbounded. As a result, we obtain competitive uniform deviation bounds for k-Means clustering under weak assumptions on the underlying distribution. If the fourth moment is bounded, we prove a rate of $O(m^{-1/2})$ compared to the previously known $O(m^{-1/4})$ rate. Furthermore, we show that the rate also depends on the kurtosis – the normalized fourth moment which measures the “tailedness” of a distribution. We also provide improved rates under progressively stronger assumptions, namely, bounded higher moments, subgaussianity and bounded support of the underlying distribution.

Cite

Text

Bachem et al. "Uniform Deviation Bounds for K-Means Clustering." International Conference on Machine Learning, 2017.

Markdown

[Bachem et al. "Uniform Deviation Bounds for K-Means Clustering." International Conference on Machine Learning, 2017.](https://mlanthology.org/icml/2017/bachem2017icml-uniform/)

BibTeX

@inproceedings{bachem2017icml-uniform,
  title     = {{Uniform Deviation Bounds for K-Means Clustering}},
  author    = {Bachem, Olivier and Lucic, Mario and Hassani, S. Hamed and Krause, Andreas},
  booktitle = {International Conference on Machine Learning},
  year      = {2017},
  pages     = {283-291},
  volume    = {70},
  url       = {https://mlanthology.org/icml/2017/bachem2017icml-uniform/}
}