Stochastic Neighbor Compression

Matt Kusner, Stephen Tyree, Kilian Weinberger, Kunal Agrawal

ICML 2014 pp. 622-630

/icml/2014/kusner2014icml-stochastic/

Abstract

We present Stochastic Neighborhood Compression (SNC), an algorithm to compress a dataset for the purpose of k-nearest neighbor (kNN) classification. Given training data, SNC learns a much smaller synthetic data set, that minimizes the stochastic 1-nearest neighbor classification error on the training data. This approach has several appealing properties: due to its small size, the compressed set speeds up kNN testing drastically (up to several orders of magnitude, in our experiments); it makes the kNN classifier substantially more robust to label noise; on 4 of 7 data sets it yields lower test error than kNN on the entire training set, even at compression ratios as low as 2%; finally, the SNC compression leads to impressive speed ups over kNN even when kNN and SNC are both used with ball-tree data structures, hashing, and LMNN dimensionality reduction, demonstrating that it is complementary to existing state-of-the-art algorithms to speed up kNN classification and leads to substantial further improvements.

PDF ICML Semantic Scholar

Cite

Text

Kusner et al. "Stochastic Neighbor Compression." International Conference on Machine Learning, 2014.

Markdown

[Kusner et al. "Stochastic Neighbor Compression." International Conference on Machine Learning, 2014.](https://mlanthology.org/icml/2014/kusner2014icml-stochastic/)

BibTeX

@inproceedings{kusner2014icml-stochastic,
  title     = {{Stochastic Neighbor Compression}},
  author    = {Kusner, Matt and Tyree, Stephen and Weinberger, Kilian and Agrawal, Kunal},
  booktitle = {International Conference on Machine Learning},
  year      = {2014},
  pages     = {622-630},
  volume    = {32},
  url       = {https://mlanthology.org/icml/2014/kusner2014icml-stochastic/}
}