Medoids in Almost-Linear Time via Multi-Armed Bandits

Abstract

Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit to compute the medoid with high probability, which uses $O(n\log n)$ distance evaluations. Med-dit is based on a connection with the Multi-Armed Bandit problem. We evaluate the performance of Med-dit empirically on the Netflix-prize and single-cell RNA-seq datasets, containing hundreds of thousands of points living in tens of thousands of dimensions, and observe a $5$-$10$x improvement in performance over the current state of the art. We have released the code of Med-dit and our empirical results at https://github.com/bagavi/Meddit.

Cite

Text

Bagaria et al. "Medoids in Almost-Linear Time via Multi-Armed Bandits." International Conference on Artificial Intelligence and Statistics, 2018.

Markdown

[Bagaria et al. "Medoids in Almost-Linear Time via Multi-Armed Bandits." International Conference on Artificial Intelligence and Statistics, 2018.](https://mlanthology.org/aistats/2018/bagaria2018aistats-medoids/)

BibTeX

@inproceedings{bagaria2018aistats-medoids,
  title     = {{Medoids in Almost-Linear Time via Multi-Armed Bandits}},
  author    = {Bagaria, Vivek Kumar and Kamath, Govinda M. and Ntranos, Vasilis and Zhang, Martin J. and Tse, David},
  booktitle = {International Conference on Artificial Intelligence and Statistics},
  year      = {2018},
  pages     = {500-509},
  url       = {https://mlanthology.org/aistats/2018/bagaria2018aistats-medoids/}
}