B-Bit Minwise Hashing for Estimating Three-Way Similarities

Abstract

Computing two-way and multi-way set similarities is a fundamental problem. This study focuses on estimating 3-way resemblance (Jaccard similarity) using b-bit minwise hashing. While traditional minwise hashing methods store each hashed value using 64 bits, b-bit minwise hashing only stores the lowest b bits (where b>= 2 for 3-way). The extension to 3-way similarity from the prior work on 2-way similarity is technically non-trivial. We develop the precise estimator which is accurate and very complicated; and we recommend a much simplified estimator suitable for sparse data. Our analysis shows that $b$-bit minwise hashing can normally achieve a 10 to 25-fold improvement in the storage space required for a given estimator accuracy of the 3-way resemblance.

Cite

Text

Li et al. "B-Bit Minwise Hashing for Estimating Three-Way Similarities." Neural Information Processing Systems, 2010.

Markdown

[Li et al. "B-Bit Minwise Hashing for Estimating Three-Way Similarities." Neural Information Processing Systems, 2010.](https://mlanthology.org/neurips/2010/li2010neurips-bbit/)

BibTeX

@inproceedings{li2010neurips-bbit,
  title     = {{B-Bit Minwise Hashing for Estimating Three-Way Similarities}},
  author    = {Li, Ping and Konig, Arnd and Gui, Wenhao},
  booktitle = {Neural Information Processing Systems},
  year      = {2010},
  pages     = {1387-1395},
  url       = {https://mlanthology.org/neurips/2010/li2010neurips-bbit/}
}