Data Structures for Density Estimation
Abstract
We study statistical/computational tradeoffs for the following density estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete domain of size $n$, and sampling access to a distribution $p$, identify $v_i$ that is "close" to $p$. Our main result is the first data structure that, given a sublinear (in $n$) number of samples from $p$, identifies $v_i$ in time sublinear in $k$. We also give an improved version of the algorithm of Acharya et al. (2018) that reports $v_i$ in time linear in $k$. The experimental evaluation of the latter algorithm shows that it achieves a significant reduction in the number of operations needed to achieve a given accuracy compared to prior work.
Cite
Text
Aamand et al. "Data Structures for Density Estimation." International Conference on Machine Learning, 2023.Markdown
[Aamand et al. "Data Structures for Density Estimation." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/aamand2023icml-data/)BibTeX
@inproceedings{aamand2023icml-data,
title = {{Data Structures for Density Estimation}},
author = {Aamand, Anders and Andoni, Alexandr and Chen, Justin Y. and Indyk, Piotr and Narayanan, Shyam and Silwal, Sandeep},
booktitle = {International Conference on Machine Learning},
year = {2023},
pages = {1-18},
volume = {202},
url = {https://mlanthology.org/icml/2023/aamand2023icml-data/}
}