Doubly Approximate Nearest Neighbor Classification

Abstract

Nonparametric classification models, such as K-Nearest Neighbor (KNN), have become particularly powerful tools in machine learning and data mining, due to their simplicity and flexibility. However, the testing time of the KNN classifier becomes unacceptable and the KNN's performance deteriorates significantly when applied to data sets with millions of dimensions. We observe that state-of-the-art approximate nearest neighbor (ANN) methods aim to either reduce the number of distance comparisons based on tree structure or decrease the cost of distance computation by dimension reduction methods. In this paper, we propose a doubly approximate nearest neighbor classification strategy, which marries the two branches which compress the dimensions for decreasing distance computation cost as well as reduce the number of distance comparison instead of full scan. Under this strategy, we build a compressed dimensional tree (CD-Tree) to avoid unnecessary distance calculations. In each decision node, we propose a novel feature selection paradigm by optimizing the feature selection vector as well as the separator (indicator variables for splitting instances) with the maximum margin. An efficient algorithm is then developed to find the globally optimal solution with convergence guarantee. Furthermore, we also provide a data-dependent generalization error bound for our model, which reveals a new insight for the design of ANN classification algorithms. Our empirical studies show that our algorithm consistently obtains competitive or better classification results on all data sets, yet we can also achieve three orders of magnitude faster than state-of-the-art libraries on very high dimensions.

Cite

Text

Liu et al. "Doubly Approximate Nearest Neighbor Classification." AAAI Conference on Artificial Intelligence, 2018. doi:10.1609/AAAI.V32I1.11690

Markdown

[Liu et al. "Doubly Approximate Nearest Neighbor Classification." AAAI Conference on Artificial Intelligence, 2018.](https://mlanthology.org/aaai/2018/liu2018aaai-doubly/) doi:10.1609/AAAI.V32I1.11690

BibTeX

@inproceedings{liu2018aaai-doubly,
  title     = {{Doubly Approximate Nearest Neighbor Classification}},
  author    = {Liu, Weiwei and Liu, Zhuanghua and Tsang, Ivor W. and Zhang, Wenjie and Lin, Xuemin},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2018},
  pages     = {3683-3690},
  doi       = {10.1609/AAAI.V32I1.11690},
  url       = {https://mlanthology.org/aaai/2018/liu2018aaai-doubly/}
}