That Was Fast! Speeding up NN Search of High Dimensional Distributions.

Abstract

We present a data structure for fast nearest neighbor retrieval of generative models of documents based on KL divergence. Our data structure, which shares some similarity with Bregman Ball Trees, consists of a hierarchical partition of a database, and uses a novel branch and bound methodology for search. The main technical contribution of the paper is a novel and efficient algorithm for deciding whether to explore nodes during backtracking, based on a variational approximation. This reduces the number of computations per node, and overcomes the limitations of Bregman Ball Trees on high dimensional data. In addition, our strategy is applicable also to probability distributions with hidden state variables, and is not limited to regular exponential family distributions. Experiments demonstrate substantial speed-ups over both Bregman Ball Trees and over brute force search, on both moderate and high dimensional histogram data. In addition, experiments on linear dynamical systems demonstrate the flexibility of our approach to latent variable models.

Cite

Text

Coviello et al. "That Was Fast! Speeding up NN Search of High Dimensional Distributions.." International Conference on Machine Learning, 2013.

Markdown

[Coviello et al. "That Was Fast! Speeding up NN Search of High Dimensional Distributions.." International Conference on Machine Learning, 2013.](https://mlanthology.org/icml/2013/coviello2013icml-fast/)

BibTeX

@inproceedings{coviello2013icml-fast,
  title     = {{That Was Fast! Speeding up NN Search of High Dimensional Distributions.}},
  author    = {Coviello, Emanuele and Mumtaz, Adeel and Chan, Antoni and Lanckriet, Gert},
  booktitle = {International Conference on Machine Learning},
  year      = {2013},
  pages     = {468-476},
  volume    = {28},
  url       = {https://mlanthology.org/icml/2013/coviello2013icml-fast/}
}