Data Structures for Density Estimation

Abstract

We study statistical/computational tradeoffs for the following density estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete domain of size $n$, and sampling access to a distribution $p$, identify $v_i$ that is "close" to $p$. Our main result is the first data structure that, given a sublinear (in $n$) number of samples from $p$, identifies $v_i$ in time sublinear in $k$. We also give an improved version of the algorithm of Acharya et al. (2018) that reports $v_i$ in time linear in $k$. The experimental evaluation of the latter algorithm shows that it achieves a significant reduction in the number of operations needed to achieve a given accuracy compared to prior work.

Cite

Text

Aamand et al. "Data Structures for Density Estimation." International Conference on Machine Learning, 2023.

Markdown

[Aamand et al. "Data Structures for Density Estimation." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/aamand2023icml-data/)

BibTeX

@inproceedings{aamand2023icml-data,
  title     = {{Data Structures for Density Estimation}},
  author    = {Aamand, Anders and Andoni, Alexandr and Chen, Justin Y. and Indyk, Piotr and Narayanan, Shyam and Silwal, Sandeep},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {1-18},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/aamand2023icml-data/}
}