On Densification for Minwise Hashing

Tung Mai, Anup Rao, Matt Kapilevich, Ryan Rossi, Yasin Abbasi-Yadkori, Ritwik Sinha

UAI 2019 pp. 831-840

/uai/2019/mai2019uai-densification/

Abstract

One Permutation Hashing (OPH) is a significantly more efficient alternative to the popular minwise hashing. To produce a sketch of size $k$, OPH requires just one hash function whereas the classical minwise hashing requires $k$ hash functions. However, OPH does not have the desirable locality sensitive hashing (LSH) property that is important for indexing. [Srivastava and Li 2014, ICML] proposed the novel idea of densification to produce LSH sketches from OPH sketches, and gave the first densification routine. In this paper, we give a necessary and sufficient condition for a densification routine to result in LSH sketches when applied to OPH sketches. Furthermore, we give a novel densification routine that for every input, takes O($k \log k$) time in expectation and achieves better variance than the previous best bound obtained by [Srivastava 2017]. The running time of the densification routine given in [Srivastava 2017] for worst case inputs is O($k^2$) in expectation.

PDF UAI Semantic Scholar

Cite

Text

Mai et al. "On Densification for Minwise Hashing." Uncertainty in Artificial Intelligence, 2019.

Markdown

[Mai et al. "On Densification for Minwise Hashing." Uncertainty in Artificial Intelligence, 2019.](https://mlanthology.org/uai/2019/mai2019uai-densification/)

BibTeX

@inproceedings{mai2019uai-densification,
  title     = {{On Densification for Minwise Hashing}},
  author    = {Mai, Tung and Rao, Anup and Kapilevich, Matt and Rossi, Ryan and Abbasi-Yadkori, Yasin and Sinha, Ritwik},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2019},
  pages     = {831-840},
  volume    = {115},
  url       = {https://mlanthology.org/uai/2019/mai2019uai-densification/}
}