Nearest Neighbors in High-Dimensional Data: The Emergence and Influence of Hubs

Radovanovic, Milos; Nanopoulos, Alexandros; Ivanovic, Mirjana

doi:10.1145/1553374.1553485

Nearest Neighbors in High-Dimensional Data: The Emergence and Influence of Hubs

Milos Radovanovic, Alexandros Nanopoulos, Mirjana Ivanovic

ICML 2009 pp. 865-872

doi:10.1145/1553374.1553485 /icml/2009/radovanovic2009icml-nearest/

Abstract

High dimensionality can pose severe difficulties, widely recognized as different aspects of the curse of dimensionality. In this paper we study a new aspect of the curse pertaining to the distribution of k-occurrences, i.e., the number of times a point appears among the $k$ nearest neighbors of other points in a data set. We show that, as dimensionality increases, this distribution becomes considerably skewed and hub points emerge (points with very high $k$-occurrences). We examine the origin of this phenomenon, showing that it is an inherent property of high-dimensional vector space, and explore its influence on applications based on measuring distances in vector spaces, notably classification, clustering, and information retrieval.

PDF ICML Semantic Scholar

Cite

Text

Radovanovic et al. "Nearest Neighbors in High-Dimensional Data: The Emergence and Influence of Hubs." International Conference on Machine Learning, 2009. doi:10.1145/1553374.1553485

Markdown

[Radovanovic et al. "Nearest Neighbors in High-Dimensional Data: The Emergence and Influence of Hubs." International Conference on Machine Learning, 2009.](https://mlanthology.org/icml/2009/radovanovic2009icml-nearest/) doi:10.1145/1553374.1553485

BibTeX

@inproceedings{radovanovic2009icml-nearest,
  title     = {{Nearest Neighbors in High-Dimensional Data: The Emergence and Influence of Hubs}},
  author    = {Radovanovic, Milos and Nanopoulos, Alexandros and Ivanovic, Mirjana},
  booktitle = {International Conference on Machine Learning},
  year      = {2009},
  pages     = {865-872},
  doi       = {10.1145/1553374.1553485},
  url       = {https://mlanthology.org/icml/2009/radovanovic2009icml-nearest/}
}