Model Selection in Clustering by Uniform Convergence Bounds

Abstract

Unsupervised learning algorithms are designed to extract struc(cid:173) ture from data samples. Reliable and robust inference requires a guarantee that extracted structures are typical for the data source, Le., similar structures have to be inferred from a second sample set of the same data source. The overfitting phenomenon in max(cid:173) imum entropy based annealing algorithms is exemplarily studied for a class of histogram clustering models. Bernstein's inequality for large deviations is used to determine the maximally achievable approximation quality parameterized by a minimal temperature. Monte Carlo simulations support the proposed model selection cri(cid:173) terion by finite temperature annealing.

Cite

Text

Buhmann and Held. "Model Selection in Clustering by Uniform Convergence Bounds." Neural Information Processing Systems, 1999.

Markdown

[Buhmann and Held. "Model Selection in Clustering by Uniform Convergence Bounds." Neural Information Processing Systems, 1999.](https://mlanthology.org/neurips/1999/buhmann1999neurips-model/)

BibTeX

@inproceedings{buhmann1999neurips-model,
  title     = {{Model Selection in Clustering by Uniform Convergence Bounds}},
  author    = {Buhmann, Joachim M. and Held, Marcus},
  booktitle = {Neural Information Processing Systems},
  year      = {1999},
  pages     = {216-222},
  url       = {https://mlanthology.org/neurips/1999/buhmann1999neurips-model/}
}