On Fine-Grained Distinct Element Estimation

Abstract

We study the problem of distributed distinct element estimation, where $\alpha$ servers each receive a subset of a universe $[n]$ and aim to compute a $(1+\varepsilon)$-approximation to the number of distinct elements using minimal communication. While prior work establishes a worst-case bound of $\Theta\left(\alpha\log n+\frac{\alpha}{\varepsilon^2}\right)$ bits, these results rely on assumptions that may not hold in practice. We introduce a new parameterization based on the number $C = \frac{\beta}{\varepsilon^2}$ of pairwise collisions, i.e., instances where the same element appears on multiple servers, and design a protocol that uses only $O\left(\alpha\log n\log\log n+\frac{\sqrt{\beta}}{\varepsilon^2} \log n\right)$ bits, breaking previous lower bounds when $C$ is small. We further improve our algorithm under assumptions on the number of distinct elements or collisions and provide matching lower bounds in all regimes, establishing $C$ as a tight complexity measure for the problem. Finally, we consider streaming algorithms for distinct element estimation parameterized by the number of items with frequency larger than $1$. Overall, our results offer insight into why statistical problems with known hardness results can be efficiently solved in practice.

Cite

Text

Diakonikolas et al. "On Fine-Grained Distinct Element Estimation." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Diakonikolas et al. "On Fine-Grained Distinct Element Estimation." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/diakonikolas2025icml-finegrained/)

BibTeX

@inproceedings{diakonikolas2025icml-finegrained,
  title     = {{On Fine-Grained Distinct Element Estimation}},
  author    = {Diakonikolas, Ilias and Kane, Daniel and Lee, Jasper C.H. and Pittas, Thanasis and Woodruff, David and Zhou, Samson},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {13643-13678},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/diakonikolas2025icml-finegrained/}
}