Variance Reduction in Frequency Estimators via Control Variates Method

Abstract

Generating succinct summaries (also known as sketches) of massive data streams is becoming increasingly important. Such a task typically requires fast, accurate, and small space algorithms in order to support the downstream applications, mainly in areas such as data analysis, machine learning and data mining. A fundamental and well-studied problem in this context is that of estimating the frequencies of the items appearing in a data stream. The Count-Min-Sketch (Cormode and Muthukrishnan, J. Algorithms, 55(1):58–75, 2005) and Count-Sketch (Charikar et al., Theor. Comput. Sci., 312(1):3–15, 2004) are two known classical algorithms for this purpose. However, a limitation of these techniques is that the variance of their estimate tends to be large. In this work, we address this problem and suggest a technique that reduces the variance in their respective estimates, at the cost of little computational overhead. Our technique relies on the classical Control-Variate trick (Lavenberg and Welch, Manage. Sci., 27:322–335, 1981) used for reducing variance in Monte-Carlo simulation. We present a theoretical analysis of our proposal by carefully choosing the control variates and complement them with experiments on synthetic as well as real-world datasets.

Cite

Text

Pratap and Kulkarni. "Variance Reduction in Frequency Estimators via Control Variates Method." Uncertainty in Artificial Intelligence, 2021.

Markdown

[Pratap and Kulkarni. "Variance Reduction in Frequency Estimators via Control Variates Method." Uncertainty in Artificial Intelligence, 2021.](https://mlanthology.org/uai/2021/pratap2021uai-variance/)

BibTeX

@inproceedings{pratap2021uai-variance,
  title     = {{Variance Reduction in Frequency Estimators via Control Variates Method}},
  author    = {Pratap, Rameshwar and Kulkarni, Raghav},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2021},
  pages     = {183-193},
  volume    = {161},
  url       = {https://mlanthology.org/uai/2021/pratap2021uai-variance/}
}