Nearly-Linear Time and Massively Parallel Algorithms for $k$-Anonymity

Abstract

$k$-anonymity is a widely-used privacy-preserving concept that ensures each record in a dataset is indistinguishable from at least $k-1$ other records. In this paper, we revisit $k$-anonymity by suppression and give an $O(k)$-approximation algorithm with a nearly-linear runtime of $\tilde{O}(nd + n^{1+1/C^2}/k^{1/C^2})$ for an arbitrary constant $C$, where $n$ is the number of records and $d$ is the number of attributes. Previous algorithms with provable guarantees either (1) achieve the same $O(k)$ approximation ratio but require at least $O(n^2 k)$ runtime, or (2) provide a better $O(\log k)$ approximation ratio at the cost of an impractical $O(n^{2k})$ worst-case runtime for general $d$ and $k$. Our algorithm extends to the Massively Parallel Computation (MPC) model, where it can be adapted into an MPC algorithm requiring $\tilde{O}(\log^{1+\epsilon} n)$ rounds and total space $O(n^{1+1/C^2}(d+k))$. Empirically, we also demonstrate that our algorithmic ideas can be adapted to existing heuristic methods, leading to significant speed-ups while preserving comparable performance. Although~\citep{PS07} introduced improvements to achieve more practical runtimes for their $O(\log k)$-approximation algorithm, its worst-case runtime remains $O(n^{2k})$. A natural question arises: can we develop an algorithm with an $o(k)$ approximation ratio and a polynomial runtime? We investigate the single-point $k$-anonymity problem, where the goal is to select $k-1$ additional records to make a given record indistinguishable. Surprisingly, assuming the dense vs random conjecture in complexity theory, we show that for $n = k^c$, no algorithm can achieve a $k^{1 - O(1/c)}$ approximation in $\mathrm{poly}(k)$ time. This provides evidence of the inherent hardness of the $k$-anonymity problem.

Cite

Text

Aydin et al. "Nearly-Linear Time and Massively Parallel Algorithms for $k$-Anonymity." Advances in Neural Information Processing Systems, 2025.

Markdown

[Aydin et al. "Nearly-Linear Time and Massively Parallel Algorithms for $k$-Anonymity." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/aydin2025neurips-nearlylinear/)

BibTeX

@inproceedings{aydin2025neurips-nearlylinear,
  title     = {{Nearly-Linear Time and Massively Parallel Algorithms for $k$-Anonymity}},
  author    = {Aydin, Kevin and Lin, Honghao and Woodruff, David and Zhong, Peilin},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/aydin2025neurips-nearlylinear/}
}