Conformal Frequency Estimation with Sketched Data

Abstract

A flexible conformal inference method is developed to construct confidence intervals for the frequencies of queried objects in very large data sets, based on a much smaller sketch of those data. The approach is data-adaptive and requires no knowledge of the data distribution or of the details of the sketching algorithm; instead, it constructs provably valid frequentist confidence intervals under the sole assumption of data exchangeability. Although our solution is broadly applicable, this paper focuses on applications involving the count-min sketch algorithm and a non-linear variation thereof. The performance is compared to that of frequentist and Bayesian alternatives through simulations and experiments with data sets of SARS-CoV-2 DNA sequences and classic English literature.

Cite

Text

Sesia and Favaro. "Conformal Frequency Estimation with Sketched Data." Neural Information Processing Systems, 2022.

Markdown

[Sesia and Favaro. "Conformal Frequency Estimation with Sketched Data." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/sesia2022neurips-conformal/)

BibTeX

@inproceedings{sesia2022neurips-conformal,
  title     = {{Conformal Frequency Estimation with Sketched Data}},
  author    = {Sesia, Matteo and Favaro, Stefano},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/sesia2022neurips-conformal/}
}