One Sketch for All: Theory and Application of Conditional Random Sampling

Abstract

Conditional Random Sampling (CRS) was originally proposed for efficiently computing pairwise ($l_2$, $l_1$) distances, in static, large-scale, and sparse data sets such as text and Web data. It was previously presented using a heuristic argument. This study extends CRS to handle dynamic or streaming data, which much better reflect the real-world situation than assuming static data. Compared with other known sketching algorithms for dimension reductions such as stable random projections, CRS exhibits a significant advantage in that it is ``one-sketch-for-all.'' In particular, we demonstrate that CRS can be applied to efficiently compute the $l_p$ distance and the Hilbertian metrics, both are popular in machine learning. Although a fully rigorous analysis of CRS is difficult, we prove that, with a simple modification, CRS is rigorous at least for an important application of computing Hamming norms. A generic estimator and an approximate variance formula are provided and tested on various applications, for computing Hamming norms, Hamming distances, and $\chi^2$ distances.

Cite

Text

Li et al. "One Sketch for All: Theory and Application of Conditional Random Sampling." Neural Information Processing Systems, 2008.

Markdown

[Li et al. "One Sketch for All: Theory and Application of Conditional Random Sampling." Neural Information Processing Systems, 2008.](https://mlanthology.org/neurips/2008/li2008neurips-one/)

BibTeX

@inproceedings{li2008neurips-one,
  title     = {{One Sketch for All: Theory and Application of Conditional Random Sampling}},
  author    = {Li, Ping and Church, Kenneth W. and Hastie, Trevor J.},
  booktitle = {Neural Information Processing Systems},
  year      = {2008},
  pages     = {953-960},
  url       = {https://mlanthology.org/neurips/2008/li2008neurips-one/}
}