Improving Sign-Random-Projection via Count Sketch

Abstract

Computing the angular similarity between pairs of vectors is a core part of various machine learning algorithms. The seminal work of Charikar (a.k.a. Sign-Random-Projection (SRP) or SimHash) provides an unbiased estimate for the same. However, SRP suffers from the following limitations: (i) large variance in the similarity estimation, (ii) and high running time while computing the sketch. There are improved variants that address these limitations. However, they are known to improve on only one aspect in their proposal, for e.g. Yu et al. suggest a faster algorithm, Ji et al., Kang and Wong, provide estimates with a smaller variance. In this work, we propose a sketching algorithm that addresses both aspects in one algorithm – a faster algorithm along with a smaller variance in the similarity estimation. Moreover, our algorithm is space-efficient as well. We present a rigorous theoretical analysis of our proposal and complement it via experiments on synthetic and real-world datasets.

Cite

Text

Dubey et al. "Improving Sign-Random-Projection via Count Sketch." Uncertainty in Artificial Intelligence, 2022.

Markdown

[Dubey et al. "Improving Sign-Random-Projection via Count Sketch." Uncertainty in Artificial Intelligence, 2022.](https://mlanthology.org/uai/2022/dubey2022uai-improving/)

BibTeX

@inproceedings{dubey2022uai-improving,
  title     = {{Improving Sign-Random-Projection via Count Sketch}},
  author    = {Dubey, Punit Pankaj and Verma, Bhisham Dev and Pratap, Rameshwar and Kang, Keegan},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2022},
  pages     = {599-609},
  volume    = {180},
  url       = {https://mlanthology.org/uai/2022/dubey2022uai-improving/}
}