On Fine-Grained Distinct Element Estimation
Abstract
We study the problem of distributed distinct element estimation, where $\alpha$ servers each receive a subset of a universe $[n]$ and aim to compute a $(1+\varepsilon)$-approximation to the number of distinct elements using minimal communication. While prior work establishes a worst-case bound of $\Theta\left(\alpha\log n+\frac{\alpha}{\varepsilon^2}\right)$ bits, these results rely on assumptions that may not hold in practice. We introduce a new parameterization based on the number $C = \frac{\beta}{\varepsilon^2}$ of pairwise collisions, i.e., instances where the same element appears on multiple servers, and design a protocol that uses only $O\left(\alpha\log n\log\log n+\frac{\sqrt{\beta}}{\varepsilon^2} \log n\right)$ bits, breaking previous lower bounds when $C$ is small. We further improve our algorithm under assumptions on the number of distinct elements or collisions and provide matching lower bounds in all regimes, establishing $C$ as a tight complexity measure for the problem. Finally, we consider streaming algorithms for distinct element estimation parameterized by the number of items with frequency larger than $1$. Overall, our results offer insight into why statistical problems with known hardness results can be efficiently solved in practice.
Cite
Text
Diakonikolas et al. "On Fine-Grained Distinct Element Estimation." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Diakonikolas et al. "On Fine-Grained Distinct Element Estimation." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/diakonikolas2025icml-finegrained/)BibTeX
@inproceedings{diakonikolas2025icml-finegrained,
title = {{On Fine-Grained Distinct Element Estimation}},
author = {Diakonikolas, Ilias and Kane, Daniel and Lee, Jasper C.H. and Pittas, Thanasis and Woodruff, David and Zhou, Samson},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {13643-13678},
volume = {267},
url = {https://mlanthology.org/icml/2025/diakonikolas2025icml-finegrained/}
}