Sharper Generalization Bounds for Clustering
Abstract
Existing generalization analysis of clustering mainly focuses on specific instantiations, such as (kernel) $k$-means, and a unified framework for studying clustering performance is still lacking. Besides, the existing excess clustering risk bounds are mostly of order $\mathcal{O}(K/\sqrt{n})$ provided that the underlying distribution has bounded support, where $n$ is the sample size and $K$ is the cluster numbers, or of order $\mathcal{O}(K^2/n)$ under strong assumptions on the underlying distribution, where these assumptions are hard to be verified in general. In this paper, we propose a unified clustering learning framework and investigate its excess risk bounds, obtaining state-of-the-art upper bounds under mild assumptions. Specifically, we derive sharper bounds of order $\mathcal{O}(K^2/n)$ under mild assumptions on the covering number of the hypothesis spaces, where these assumptions are easy to be verified. Moreover, for the hard clustering scheme, such as (kernel) $k$-means, if just assume the hypothesis functions to be bounded, we improve the upper bounds from the order $\mathcal{O}(K/\sqrt{n})$ to $\mathcal{O}(\sqrt{K}/\sqrt{n})$. Furthermore, state-of-the-art bounds of faster order $\mathcal{O}(K/n)$ are obtained with the covering number assumptions.
Cite
Text
Li and Liu. "Sharper Generalization Bounds for Clustering." International Conference on Machine Learning, 2021.Markdown
[Li and Liu. "Sharper Generalization Bounds for Clustering." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/li2021icml-sharper/)BibTeX
@inproceedings{li2021icml-sharper,
title = {{Sharper Generalization Bounds for Clustering}},
author = {Li, Shaojie and Liu, Yong},
booktitle = {International Conference on Machine Learning},
year = {2021},
pages = {6392-6402},
volume = {139},
url = {https://mlanthology.org/icml/2021/li2021icml-sharper/}
}