Nearest-Neighbor Sample Compression: Efficiency, Consistency, Infinite Dimensions

Abstract

We examine the Bayes-consistency of a recently proposed 1-nearest-neighbor-based multiclass learning algorithm. This algorithm is derived from sample compression bounds and enjoys the statistical advantages of tight, fully empirical generalization bounds, as well as the algorithmic advantages of a faster runtime and memory savings. We prove that this algorithm is strongly Bayes-consistent in metric spaces with finite doubling dimension --- the first consistency result for an efficient nearest-neighbor sample compression scheme. Rather surprisingly, we discover that this algorithm continues to be Bayes-consistent even in a certain infinite-dimensional setting, in which the basic measure-theoretic conditions on which classic consistency proofs hinge are violated. This is all the more surprising, since it is known that k-NN is not Bayes-consistent in this setting. We pose several challenging open problems for future research.

Cite

Text

Kontorovich et al. "Nearest-Neighbor Sample Compression: Efficiency, Consistency, Infinite Dimensions." Neural Information Processing Systems, 2017.

Markdown

[Kontorovich et al. "Nearest-Neighbor Sample Compression: Efficiency, Consistency, Infinite Dimensions." Neural Information Processing Systems, 2017.](https://mlanthology.org/neurips/2017/kontorovich2017neurips-nearestneighbor/)

BibTeX

@inproceedings{kontorovich2017neurips-nearestneighbor,
  title     = {{Nearest-Neighbor Sample Compression: Efficiency, Consistency, Infinite Dimensions}},
  author    = {Kontorovich, Aryeh and Sabato, Sivan and Weiss, Roi},
  booktitle = {Neural Information Processing Systems},
  year      = {2017},
  pages     = {1573-1583},
  url       = {https://mlanthology.org/neurips/2017/kontorovich2017neurips-nearestneighbor/}
}