Rapid Distance-Based Outlier Detection via Sampling

Abstract

Distance-based approaches to outlier detection are popular in data mining, as they do not require to model the underlying probability distribution, which is particularly challenging for high-dimensional data. We present an empirical comparison of various approaches to distance-based outlier detection across a large number of datasets. We report the surprising observation that a simple, sampling-based scheme outperforms state-of-the-art techniques in terms of both efficiency and effectiveness. To better understand this phenomenon, we provide a theoretical analysis why the sampling-based approach outperforms alternative methods based on k-nearest neighbor search.

Cite

Text

Sugiyama and Borgwardt. "Rapid Distance-Based Outlier Detection via Sampling." Neural Information Processing Systems, 2013.

Markdown

[Sugiyama and Borgwardt. "Rapid Distance-Based Outlier Detection via Sampling." Neural Information Processing Systems, 2013.](https://mlanthology.org/neurips/2013/sugiyama2013neurips-rapid/)

BibTeX

@inproceedings{sugiyama2013neurips-rapid,
  title     = {{Rapid Distance-Based Outlier Detection via Sampling}},
  author    = {Sugiyama, Mahito and Borgwardt, Karsten},
  booktitle = {Neural Information Processing Systems},
  year      = {2013},
  pages     = {467-475},
  url       = {https://mlanthology.org/neurips/2013/sugiyama2013neurips-rapid/}
}