Fast near Neighbor Search in High-Dimensional Binary Data
Abstract
Numerous applications in search, databases, machine learning, and computer vision, can benefit from efficient algorithms for near neighbor search. This paper proposes a simple framework for fast near neighbor search in high-dimensional binary data , which are common in practice (e.g., text). We develop a very simple and effective strategy for sub-linear time near neighbor search, by creating hash tables directly using the bits generated by b -bit minwise hashing. The advantages of our method are demonstrated through thorough comparisons with two strong baselines: spectral hashing and sign (1-bit) random projections .
Cite
Text
Shrivastava and Li. "Fast near Neighbor Search in High-Dimensional Binary Data." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2012. doi:10.1007/978-3-642-33460-3_36Markdown
[Shrivastava and Li. "Fast near Neighbor Search in High-Dimensional Binary Data." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2012.](https://mlanthology.org/ecmlpkdd/2012/shrivastava2012ecmlpkdd-fast/) doi:10.1007/978-3-642-33460-3_36BibTeX
@inproceedings{shrivastava2012ecmlpkdd-fast,
title = {{Fast near Neighbor Search in High-Dimensional Binary Data}},
author = {Shrivastava, Anshumali and Li, Ping},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2012},
pages = {474-489},
doi = {10.1007/978-3-642-33460-3_36},
url = {https://mlanthology.org/ecmlpkdd/2012/shrivastava2012ecmlpkdd-fast/}
}