Fast near Neighbor Search in High-Dimensional Binary Data

Shrivastava, Anshumali; Li, Ping

doi:10.1007/978-3-642-33460-3_36

Fast near Neighbor Search in High-Dimensional Binary Data

Anshumali Shrivastava, Ping Li

ECML-PKDD 2012 pp. 474-489

doi:10.1007/978-3-642-33460-3_36 /ecmlpkdd/2012/shrivastava2012ecmlpkdd-fast/

Abstract

Numerous applications in search, databases, machine learning, and computer vision, can benefit from efficient algorithms for near neighbor search. This paper proposes a simple framework for fast near neighbor search in high-dimensional binary data , which are common in practice (e.g., text). We develop a very simple and effective strategy for sub-linear time near neighbor search, by creating hash tables directly using the bits generated by b -bit minwise hashing. The advantages of our method are demonstrated through thorough comparisons with two strong baselines: spectral hashing and sign (1-bit) random projections .

PDF ECML-PKDD Semantic Scholar

Cite

Text

Shrivastava and Li. "Fast near Neighbor Search in High-Dimensional Binary Data." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2012. doi:10.1007/978-3-642-33460-3_36

Markdown

[Shrivastava and Li. "Fast near Neighbor Search in High-Dimensional Binary Data." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2012.](https://mlanthology.org/ecmlpkdd/2012/shrivastava2012ecmlpkdd-fast/) doi:10.1007/978-3-642-33460-3_36

BibTeX

@inproceedings{shrivastava2012ecmlpkdd-fast,
  title     = {{Fast near Neighbor Search in High-Dimensional Binary Data}},
  author    = {Shrivastava, Anshumali and Li, Ping},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2012},
  pages     = {474-489},
  doi       = {10.1007/978-3-642-33460-3_36},
  url       = {https://mlanthology.org/ecmlpkdd/2012/shrivastava2012ecmlpkdd-fast/}
}